JiekaiLab / scTE

MIT License
87 stars 27 forks source link

IndexError: list index out of range #41

Open ShambaMondal opened 1 year ago

ShambaMondal commented 1 year ago

Hi,

I ran scTE with the following options, and it threw an IndexError. I am pasting the details below. Could you please help? Python version is 3.9. mm39 index was successfully created with scTE_build, I had included the option in your code. A bam file generated with cellranger v7.0 was used as input.

================ $ scTE -i input.bam -o out -x mm39.exclusive.idx -p 80 -CB CB -UMI UB DEBUG : Creating converter from 7 to 5 DEBUG : Creating converter from 5 to 7 DEBUG : Creating converter from 7 to 5 DEBUG : Creating converter from 5 to 7 INFO : Parameter list: Sample = out Reference annotation index = mm39.exclusive.idx Minimum number of genes required = 200 Minimum number of counts required = None Number of threads = 80

INFO : Loading the genome annotation index... 2022-08-16 16:32:08 INFO : Loaded 'mm39.exclusive.idx' binary file with 4018326 items ['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '3', '4', '5', '6', '7', '8', '9', 'M', 'X', 'Y'] INFO : Finished loading the genome annotation index... 2022-08-16 16:32:54

INFO : Processing BAM/SAM files ...2022-08-16 16:32:54 INFO : Input SAM/BAM file appears to be valid CB UB good

INFO : Done BAM/SAM files processing ...2022-08-16 16:59:20

INFO : Splitting ...2022-08-16 16:59:20 INFO : Executing multiple thread path with 80 threads multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/scTE-1.0-py3.9.egg/scTE/base.py", line 366, in splitChr CRs[t[3]] += 1 IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/bin/scTE", line 4, in import('pkg_resources').run_script('scTE==1.0', 'scTE') File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/pkg_resources/init.py", line 665, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/pkg_resources/init.py", line 1463, in run_script exec(code, namespace, namespace) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/scTE-1.0-py3.9.egg/EGG-INFO/scripts/scTE", line 169, in main() File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/scTE-1.0-py3.9.egg/EGG-INFO/scripts/scTE", line 134, in main pool.map(partial_work, chr_list) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value IndexError: list index out of range

================

Thanks, Sam

ShambaMondal commented 1 year ago

UPDATE: The error occurs even without "-p" option.

$ scTE -i input.bam -o out -x mm39.exclusive.idx -CB CB -UMI UB DEBUG : Creating converter from 7 to 5 DEBUG : Creating converter from 5 to 7 DEBUG : Creating converter from 7 to 5 DEBUG : Creating converter from 5 to 7 INFO : Parameter list: Sample = out Reference annotation index = mm39.exclusive.idx Minimum number of genes required = 200 Minimum number of counts required = None Number of threads = 1

INFO : Loading the genome annotation index... 2022-08-16 18:09:32 INFO : Loaded 'mm39.exclusive.idx' binary file with 4018326 items ['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '3', '4', '5', '6', '7', '8', '9', 'M', 'X', 'Y'] INFO : Finished loading the genome annotation index... 2022-08-16 18:10:15

INFO : Processing BAM/SAM files ...2022-08-16 18:10:15 INFO : Input SAM/BAM file appears to be valid CB UB good

INFO : Done BAM/SAM files processing ...2022-08-16 18:41:25

INFO : Splitting ...2022-08-16 18:41:25 INFO : Executing single thread path Traceback (most recent call last): File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/bin/scTE", line 4, in import('pkg_resources').run_script('scTE==1.0', 'scTE') File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/pkg_resources/init.py", line 665, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/pkg_resources/init.py", line 1463, in run_script exec(code, namespace, namespace) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/scTE-1.0-py3.9.egg/EGG-INFO/scripts/scTE", line 169, in main() File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/scTE-1.0-py3.9.egg/EGG-INFO/scripts/scTE", line 129, in main whitelist = splitAllChrs(chr_list, filename=outname, genenumber=args.genenumber, countnumber=args.countnumber, UMI=args.UMI) File "/home/sam/softwares/anaconda2/envs/scTE_python3.9/lib/python3.9/site-packages/scTE-1.0-py3.9.egg/scTE/base.py", line 263, in splitAllChrs CRs[t[3]] += 1 IndexError: list index out of range

jphe commented 1 year ago

It seems that some of the reads does not have cell barcodes or UMI tag, can you try pre-filter the bam file to remove those reads?

ShambaMondal commented 1 year ago

@jphe : Thank you! scTE ran successfully after the following small modifications:

  1. around line 263 as following:

    if UMI:
        if line in uniques[chrom]:
            continue
        uniques[chrom].add(line)
        try:
            CRs[t[3]] += 1 # This was the original line in the code.
        except IndexError:
            pass
    else:
        try:
            CRs[t[3]] += 1 # This was the original line in the code. 
        except IndexError:
            pass
  2. A similar error was thrown for t[3] within the "align" function. I modified that similarly:

    try:
        barcode = t[3] # This was the original code.
    except IndexError:
        barcode = '' # empty string
ShambaMondal commented 1 year ago

UPDATE: Even though the first sample ran successfully, scTE rejected other samples that had reads without cell barcode tag within the first hundred reads of the corresponding bam files. So, finally I followed your suggestion, and filtered the bam files. scTE ran without problem afterwards. Thanks again :)

alecpnkw commented 1 year ago

I am also getting this error, @ShambaMondal @jphe do you have any tips on what the samtools command should be to remove untagged reads? Thanks,