SUwonglab / arcsv

Complex structural variant detection from WGS data
MIT License
21 stars 6 forks source link

`ValueError: start out of range (-1)` and `imp module is deprecated` #3

Closed carleshf closed 5 years ago

carleshf commented 5 years ago

I have been using arcsv to genotype SV in a series of samples aligned using BWA without problems. But now, I'm aprocessing a series of samples generated using 10X and aligned with emerald and I got the following error:

/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[run] ref files {'reference': '/media/NFS2/refdata-b37-2.1.0/fasta/genome.fa', 'gap': '/media/NFS/Carles/SV/tools/arcsv/resources/GRCh37_gap.bed'}
[run] calling SVs in 2:0-243199373

Traceback (most recent call last):
  File "/home/carleshf/miniconda2/envs/py36/bin/arcsv", line 156, in <module>
    main()
  File "/home/carleshf/miniconda2/envs/py36/bin/arcsv", line 26, in main
    run(args)
  File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/call_sv.py", line 93, in run
    call_sv(opts, inputs, reference_files)
  File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/call_sv.py", line 161, in call_sv
    pb_out = parse_bam(opts, reference_files, bamfiles)
  File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 109, in parse_bam
    bam_has_unmapped = has_unmapped_records(bam)  File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 491, in has_unmapped_records
    if any([a.is_unmapped and a.qname == aln.qname for a in alns]):
  File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 491, in <listcomp>
    if any([a.is_unmapped and a.qname == aln.qname for a in alns]):
  File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 430, in <genexpr>
    return itertools.chain.from_iterable(b.fetch(*o1, **o2) for b in self.bamlist)
  File "pysam/libcalignmentfile.pyx", line 855, in pysam.libcalignmentfile.AlignmentFile.fetch (pysam/libcalignmentfile.c:11188)
  File "pysam/libcalignmentfile.pyx", line 783, in pysam.libcalignmentfile.AlignmentFile.parse_region (pysam/libcalignmentfile.c:10755)
ValueError: start out of range (-1)

I don't think that the warning has any impact on the caller but I am not getting why the problem with the BAM files. Any help is welcome!

jgarthur commented 5 years ago

Hi Carles, the issue should now be fixed. The problem (I think) was that emerald produces SAM records with the "mate position" field set to -1, which tripped up some unimportant code.