Magdoll / cDNA_Cupcake

Miscellaneous collection of Python and R scripts for processing Iso-Seq data
BSD 3-Clause Clear License
257 stars 104 forks source link

List index out of range error in collapse_isoforms_by_sam.py #74

Open hvbakel opened 5 years ago

hvbakel commented 5 years ago

Dear Liz, When running cDNA_cupcake, I'm encountering the following error:

Traceback (most recent call last):
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/bin/collapse_isoforms_by_sam.py", line 4, in <module>
    __import__('pkg_resources').run_script('cupcake==7.0', 'collapse_isoforms_by_sam.py')
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/lib/python2.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(code, namespace, namespace)
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/lib/python2.7/site-packages/cupcake-7.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 248, in <module>
    main(args)
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/lib/python2.7/site-packages/cupcake-7.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 207, in main
    collapse_fuzzy_junctions(f_good.name, f_txt.name, args.allow_extra_5exon, internal_fuzzy_max_dist=args.max_fuzzy_junction)
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/lib/python2.7/site-packages/cupcake-7.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 152, in collapse_fuzzy_junctions
    _size = get_fl_from_id(group_info[pbid])
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/lib/python2.7/site-packages/cupcake-7.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in get_fl_from_id
    return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members)
  File "/hpc/users/pintod02/.conda/envs/pbisoseq/lib/python2.7/site-packages/cupcake-7.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in <genexpr>
    return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members)
IndexError: list index out of range

Any idea what the issue could be?

Magdoll commented 5 years ago

This is usually a sequence ID mis-match issue. That said, you are on older versions of Cupcake - can you please update to latest (v7.5) first and if errors remain, can you share with me your command that you used and the sequence format? (just give me like the first 5 sequence IDs)

dtlyfoung commented 4 years ago

I am getting the same thing. Traceback below:

Traceback (most recent call last): File "/opt/conda/bin/collapse_isoforms_by_sam.py", line 235, in <module> main(args) File "/opt/conda/bin/collapse_isoforms_by_sam.py", line 185, in main for recs in iter: # recs is {'+': list of list of records, '-': list of list of records} File "/opt/conda/lib/python3.7/site-packages/cupcake/tofu/branch/branch_simple2.py", line 81, in iter_gmap_sam records = [next(quality_alignments)] File "/opt/conda/lib/python3.7/site-packages/cupcake/tofu/branch/branch_simple2.py", line 108, in get_quality_alignments for r in gmap_sam_reader: File "/opt/conda/lib/python3.7/site-packages/cupcake/io/BioReaders.py", line 377, in __next__ return GMAPSAMRecord(line, self.ref_len_dict, self.query_len_dict) File "/opt/conda/lib/python3.7/site-packages/cupcake/io/BioReaders.py", line 182, in __init__ self.process(record_line, ref_len_dict, query_len_dict) File "/opt/conda/lib/python3.7/site-packages/cupcake/io/BioReaders.py", line 400, in process self.sID = raw[2] IndexError: list index out of range

Here is my command:

collapse_isoforms_by_sam.py --input m64120_200619_171832.flnc.clustered.fasta -s mapped_isoseq_reads/m64120_200619_171832.flnc.clustered.fasta.sorted.sam --dun-merge-5-shorter -o m64120_200619_171832collapse_isoforms_by_sam.py --input m64120_200619_171832.flnc.clustered.fasta -s mapped_isoseq_reads/m64120_200619_171832.flnc.clustered.fasta.sorted.sam --dun-merge-5-shorter -o m64120_200619_171832

Does collapse_isoforms_by_sam.py take fasta as input?

dtlyfoung commented 4 years ago

My issue ended up being an upstream isoseq3 issue. I was on version 3.2.2 and upgraded to 3.3.0. Must have been something with the headers that were coming out of v.3.2.2 output vs. 3.3.0.