bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 14 forks source link

Upgrade Pysam #265

Closed creisle closed 2 years ago

creisle commented 2 years ago

Currently MAVIS is frozen to pysam 0.15.2 because there are some unexpected bugs in later versions. However this is getting harder and harder to support as setuptools/pip has issues with these lower versions

I have commented on this ticket but it was created in 2017 and there has been little to no movement: https://github.com/pysam-developers/pysam/issues/527

When we run the test with the newer versions we see the following error

FAILED tests/end_to_end/test_convert.py::TestConvert::test_delly - assert 17396140 == (7059510 - 670)

After some preliminary debugging I can see that it seems like the END tag of the INFO column is being ignored. In previous versions of pysam (<=0.15.2) this would have been used to create the "stop" field on VariantRecord. Now however it is simply dropped.

Adding a summary of testing here (python3.7 was used, will test other versions once this one is working)

Python Version pysam version Hstlib disable Flags tested Error Message
3.7 0.15.2 lmza; bz2; libcurl :heavy_check_mark:
3.7 0.15.3 lmza; bz2; libcurl :heavy_check_mark: FAILED tests/end_to_end/test_convert.py::TestConvert::test_delly - assert 17396140 == (7059510 - 670)
3.7 0.15.4 lmza; bz2; libcurl :heavy_check_mark: E OSError: unable to parse next record
3.7 0.16.0 lmza; bz2; libcurl :heavy_check_mark: E ImportError: libchtslib.cpython-37m-x86_64-linux-gnu.so: cannot open shared object file: No such file or directory
3.7 0.16.0.1 lmza; bz2; libcurl :heavy_check_mark: E OSError: unable to parse next record; FAILED tests/end_to_end/test_convert.py::TestConvert::test_delly - assert 17396140 == (7059510 - 670)
3.7 0.16.0.1 bz2; libcurl :heavy_check_mark: same as above
3.7 0.17.0 lmza; bz2; libcurl :heavy_check_mark: same as above
3.7 0.18.0 lmza; bz2; libcurl :heavy_check_mark: same as above
creisle commented 2 years ago

When I try this with pysam 0.15.4 I see a different error, this time in the manta vcf files

============================= test session starts ==============================
platform linux -- Python 3.7.2, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /projects/dat/workspace/creisle/mavis
plugins: cov-3.0.0
collected 11 items

tests/end_to_end/test_convert.py ...F.......                             [100%]

=================================== FAILURES ===================================
____________________________ TestConvert.test_manta ____________________________

self = <tests.end_to_end.test_convert.TestConvert object at 0x7f7e34e2c588>

    def test_manta(self):
>       result = self.run_main(get_data('manta_events.vcf'), SUPPORTED_TOOL.MANTA, False)

tests/end_to_end/test_convert.py:81: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/end_to_end/test_convert.py:41: in run_main
    main()
src/mavis/main.py:292: in main
    raise err
src/mavis/main.py:264: in main
    args.assume_no_untemplated,
src/mavis/main.py:36: in convert_main
    assume_no_untemplated=assume_no_untemplated,
src/mavis/tools/__init__.py:35: in convert_tool_output
    fname, file_type, stranded, log, assume_no_untemplated=assume_no_untemplated
src/mavis/tools/__init__.py:291: in _convert_tool_output
    rows = read_vcf(input_file, file_type, log)
src/mavis/tools/vcf.py:204: in convert_file
    for vcf_record in vfile.fetch():
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: unable to parse next record

pysam/libcbcf.pyx:4108: OSError
----------------------------- Captured stderr call -----------------------------
[W::vcf_parse] Contig '1    17051724   MantaBND:207:0:1:0:0:0:0    C   [1:234912188[GCCCCATC   36  PASS    SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1   .   .   .' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=1    17051724   MantaBND:207:0:1:0:0:0:0    C   [1:234912188[GCCCCATC   36  PASS    SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1   .   .   .>"
[E::vcf_parse] Could not add dummy header for contig '1    17051724   MantaBND:207:0:1:0:0:0:0    C   [1:234912188[GCCCCATC   36  PASS    SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1   .   .   .'
=============================== warnings summary ===============================
venv3.7/lib/python3.7/site-packages/Bio/Alphabet/__init__.py:27
  /projects/dat/workspace/creisle/mavis/venv3.7/lib/python3.7/site-packages/Bio/Alphabet/__init__.py:27: PendingDeprecationWarning: We intend to remove or replace Bio.Alphabet in 2020, ideally avoid using it explicitly in your code. Please get in touch if you will be adversely affected by this. https://github.com/biopython/biopython/issues/2046
    PendingDeprecationWarning,

src/mavis/schemas/__init__.py:7
  /projects/dat/workspace/creisle/mavis/src/mavis/schemas/__init__.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    class ImmutableDict(collections.Mapping):

tests/end_to_end/test_convert.py::TestConvert::test_breakdancer
  /projects/dat/workspace/creisle/mavis/src/mavis/tools/breakdancer.py:40: FutureWarning: The default value of regex will change from True to False in a future version.
    df['num_Reads_lib'] = df['num_Reads_lib'].str.replace(bam, lib)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED tests/end_to_end/test_convert.py::TestConvert::test_manta - OSError: u...
=================== 1 failed, 10 passed, 3 warnings in 2.50s ===================

Note: I install pysam with the following environment flag (from the setup.py)

export HTSLIB_CONFIGURE_OPTIONS='--disable-lzma --disable-bz2 --disable-libcurl'

Currently I am testing on the develop_v3 branch

creisle commented 2 years ago

note: the OS environ options do not appear to have any effect (probably b/c installing via wheel so it doesn't have to build from source). I will be leaving these options off for future runs

creisle commented 2 years ago

For the delly error it looks like it has something to do with this warning in the output

[W::vcf_parse_info] INFO/END=7059510 is smaller than POS at 19:17396810

however this should not be an issue since it is a translocation

Looks like it may be related to this issue: https://github.com/samtools/bcftools/issues/1154