BIMSBbioinfo / janggu

Deep learning infrastructure for genomics
GNU General Public License v3.0
254 stars 33 forks source link

Add python 3.8 compatibility #16

Closed bruceoutdoors closed 3 years ago

bruceoutdoors commented 3 years ago

Update pysam <=0.15.3 is incompatible with python 3.8 and above https://github.com/BIMSBbioinfo/janggu/pull/16#issuecomment-805835932

Is there a reason why pysam needs to be <=0.15.3? I wasn't able to build pysam in Ubuntu 20.10, but pysam was able to build when I bump pysam to the latest version (0.16.1)

$ lsb_release -a
No LSB modules are available.
Distributor ID: Pop
Description:    Pop!_OS 20.10
Release:    20.10
Codename:   groovy
$ uname -r
5.8.0-7642-generic

Here is the error output for the 0.15.3 build. It is quite large so I truncate it here, but it's gcc related

    pysam/libcsamtools.c:318:11: error: too many arguments to function ‘PyCode_New’
      318 |           PyCode_New(a, 0, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)
          |           ^~~~~~~~~~
    pysam/libcsamtools.c:1668:15: note: in expansion of macro ‘__Pyx_PyCode_New’
     1668 |     py_code = __Pyx_PyCode_New(
          |               ^~~~~~~~~~~~~~~~
    In file included from /home/bruce/anaconda3/include/python3.8/compile.h:5,
                     from /home/bruce/anaconda3/include/python3.8/Python.h:138,
                     from pysam/libcsamtools.c:4:
    /home/bruce/anaconda3/include/python3.8/code.h:122:28: note: declared here
      122 | PyAPI_FUNC(PyCodeObject *) PyCode_New(
          |                            ^~~~~~~~~~
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
wkopp commented 3 years ago

I'll have a look into the compatibility with other pysam versions. Regarding the pysam compilation issue, I'd suggest to create an issue at the pysam github repository directly, unless you were able to solve it in the meantime.

bruceoutdoors commented 3 years ago

I don't think adding an issue to pysam is necessary since it compiles in v0.16.x. It's only failing in <=0.15.x.

I've went through their GitHub releases page and didn't find a change they did in 0.16 that resolved this, but I'll take another look.

Is the unit tests failing due to this new version of pysam? I can't seem to tell from the output.

wkopp commented 3 years ago

Hi @bruceoutdoors,

the current issue in the unit tests is due to an unrelated external issue, caused by h5py and tensorflow. https://github.com/tensorflow/tensorflow/issues/44467.

However, the unit tests are still running on pysam==0.15 here (see travis logs), because the pysam version is also specified in the tox.ini file that defines the testing environments. You'd have to also remove the pysam version restriction in tox.ini for the unit tests to use pysam>=0.16. I don't have access to Ubuntu 20, but on Ubuntu 18, using pysam>=0.16 does raise an issue:

    length = aln_file.header.get_reference_length(process_chrom) + tmp_gsize.flank
pysam/libcalignmentfile.pyx:505: in pysam.libcalignmentfile.AlignmentHeader.get_reference_length
    ???
E   KeyError: 'unknown reference chr1'

Not sure yet why that is, but this was a reason to restrict the pysam version. If you are aware of a solution, I'd be happy to adjust the code accordingly.

Best, Wolfgang

bruceoutdoors commented 3 years ago

Not just in Ubuntu 18. I got a similar error in Ubuntu 20 when running this particular example in the janggu tutorial:

from janggu.data import Cover

bam_file = resource_filename('janggu',
                             'resources/sample.bam')

roi = resource_filename('janggu',
                        'resources/sample.bed')

cover = Cover.create_from_bam('read_count_coverage',
                              bamfiles=bam_file,
                              binsize=200,
                              stepsize=200,
                              roi=roi)

Outputs (truncated):

~/anaconda3/lib/python3.8/site-packages/janggu/data/coverage.py in __call__(self, garray)
    163                     continue
    164                 tmp_gsize = gsize.filter_by_region(include=process_chrom)
--> 165                 length = aln_file.header.get_reference_length(process_chrom) + tmp_gsize.flank
    166 
    167                 array = np.zeros((length, 2), dtype=dtype)

pysam/libcalignmentfile.pyx in pysam.libcalignmentfile.AlignmentHeader.get_reference_length()

KeyError: 'unknown reference chr2'

I haven't figure out why it couldn't process the BAM files either.

bruceoutdoors commented 3 years ago

@wkopp ok I figured out why pysam 0.15.3 couldn't build in Ubuntu 20. It was because it comes with python 3.8. pysam 0.15.4 and onwards solves this. This resolves the 'unknown reference chr2' we had earlier as well. As per their release https://github.com/pysam-developers/pysam/releases/tag/v0.15.4 :

Bugfix release. Principal reason for release is to update cython version in order to fix pip install pysam with python 3.8.

wkopp commented 3 years ago

Hi @bruceoutdoors,

thank you very much for reporting and fixing the pysam==0.15.3 issue in python 3.8. I appreciate it. I slightly modified the dependency in the setup.py and tox.ini to pysam<0.16,!=0.15.3 to exclude the problematic pysam version while being less restrictive in case there are future versions of pysam (e.g. pysam>0.15.4).

I'll adopt this PR,

Best, Wolfgang