Closed creisle closed 2 years ago
When I try this with pysam 0.15.4 I see a different error, this time in the manta vcf files
============================= test session starts ==============================
platform linux -- Python 3.7.2, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /projects/dat/workspace/creisle/mavis
plugins: cov-3.0.0
collected 11 items
tests/end_to_end/test_convert.py ...F....... [100%]
=================================== FAILURES ===================================
____________________________ TestConvert.test_manta ____________________________
self = <tests.end_to_end.test_convert.TestConvert object at 0x7f7e34e2c588>
def test_manta(self):
> result = self.run_main(get_data('manta_events.vcf'), SUPPORTED_TOOL.MANTA, False)
tests/end_to_end/test_convert.py:81:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/end_to_end/test_convert.py:41: in run_main
main()
src/mavis/main.py:292: in main
raise err
src/mavis/main.py:264: in main
args.assume_no_untemplated,
src/mavis/main.py:36: in convert_main
assume_no_untemplated=assume_no_untemplated,
src/mavis/tools/__init__.py:35: in convert_tool_output
fname, file_type, stranded, log, assume_no_untemplated=assume_no_untemplated
src/mavis/tools/__init__.py:291: in _convert_tool_output
rows = read_vcf(input_file, file_type, log)
src/mavis/tools/vcf.py:204: in convert_file
for vcf_record in vfile.fetch():
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E OSError: unable to parse next record
pysam/libcbcf.pyx:4108: OSError
----------------------------- Captured stderr call -----------------------------
[W::vcf_parse] Contig '1 17051724 MantaBND:207:0:1:0:0:0:0 C [1:234912188[GCCCCATC 36 PASS SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1 . . .' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=1 17051724 MantaBND:207:0:1:0:0:0:0 C [1:234912188[GCCCCATC 36 PASS SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1 . . .>"
[E::vcf_parse] Could not add dummy header for contig '1 17051724 MantaBND:207:0:1:0:0:0:0 C [1:234912188[GCCCCATC 36 PASS SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1 . . .'
=============================== warnings summary ===============================
venv3.7/lib/python3.7/site-packages/Bio/Alphabet/__init__.py:27
/projects/dat/workspace/creisle/mavis/venv3.7/lib/python3.7/site-packages/Bio/Alphabet/__init__.py:27: PendingDeprecationWarning: We intend to remove or replace Bio.Alphabet in 2020, ideally avoid using it explicitly in your code. Please get in touch if you will be adversely affected by this. https://github.com/biopython/biopython/issues/2046
PendingDeprecationWarning,
src/mavis/schemas/__init__.py:7
/projects/dat/workspace/creisle/mavis/src/mavis/schemas/__init__.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class ImmutableDict(collections.Mapping):
tests/end_to_end/test_convert.py::TestConvert::test_breakdancer
/projects/dat/workspace/creisle/mavis/src/mavis/tools/breakdancer.py:40: FutureWarning: The default value of regex will change from True to False in a future version.
df['num_Reads_lib'] = df['num_Reads_lib'].str.replace(bam, lib)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED tests/end_to_end/test_convert.py::TestConvert::test_manta - OSError: u...
=================== 1 failed, 10 passed, 3 warnings in 2.50s ===================
Note: I install pysam with the following environment flag (from the setup.py)
export HTSLIB_CONFIGURE_OPTIONS='--disable-lzma --disable-bz2 --disable-libcurl'
Currently I am testing on the develop_v3 branch
note: the OS environ options do not appear to have any effect (probably b/c installing via wheel so it doesn't have to build from source). I will be leaving these options off for future runs
For the delly error it looks like it has something to do with this warning in the output
[W::vcf_parse_info] INFO/END=7059510 is smaller than POS at 19:17396810
however this should not be an issue since it is a translocation
Looks like it may be related to this issue: https://github.com/samtools/bcftools/issues/1154
Currently MAVIS is frozen to pysam 0.15.2 because there are some unexpected bugs in later versions. However this is getting harder and harder to support as setuptools/pip has issues with these lower versions
I have commented on this ticket but it was created in 2017 and there has been little to no movement: https://github.com/pysam-developers/pysam/issues/527
When we run the test with the newer versions we see the following error
FAILED tests/end_to_end/test_convert.py::TestConvert::test_delly - assert 17396140 == (7059510 - 670)
After some preliminary debugging I can see that it seems like the END tag of the INFO column is being ignored. In previous versions of pysam (<=0.15.2) this would have been used to create the "stop" field on VariantRecord. Now however it is simply dropped.
Adding a summary of testing here (python3.7 was used, will test other versions once this one is working)