bcgsc / LongStitch

Correct and scaffold assemblies using long reads
GNU General Public License v3.0
45 stars 7 forks source link

IndexError: list index out of range #82

Closed mmpust closed 2 months ago

mmpust commented 2 months ago

Dear developers, I would like to correct/break a scaffold after running ragtag with nanopore long reads. I have installed LongStitch and dependencies with mamba.

This is my input directory

-rw-rw-r-- 1 usr usr 1.1M Jul  9 18:02 ragtag.scaffold.agp
-rw-rw-r-- 1 usr usr  31K Jul  9 18:02 ragtag.scaffold.asm.paf
-rw-rw-r-- 1 usr usr 820 Jul  9 18:02 ragtag.scaffold.asm.paf.log
-rw-rw-r-- 1 usr usr  717 Jul  9 18:02 ragtag.scaffold.confidence.txt
-rw-rw-r-- 1 usr usr    0 Jul  9 18:02 ragtag.scaffold.err
-rw-rw-r-- 1 usr usr  95M Jul  9 18:02 ragtag.scaffold.fasta
-rw-rw-r-- 1 usr usr 271K Jul  9 18:02 ragtag.scaffold.filt.fa
-rw-rw-r-- 1 usr usr   35 Jul  9 18:07 ragtag.scaffold.filt.fa.fai
-rw-rw-r-- 1 usr usr  118 Jul  9 18:02 ragtag.scaffold.stats
-rw-rw-r-- 1 usr usr 2.6M Jul  9 18:02 ref_Akkermansia_muciniphila_ATCC_BAA-835_genomic.fna
-rw-rw-r-- 1 usr usr 1.4G Jul  9 18:02 treatedM06day00_nanopore.trimmed.filt.fq.gz

I am running

 longstitch run draft=ragtag.scaffold.filt reads=treatedM06day00_nanopore.trimmed.filt G=2664102

and get the following error message

tigmint-make tigmint-long draft=ragtag.scaffold.filt reads=treatedM06day00_nanopore.trimmed.filt cut=250 t=8 G=2664102 span=auto dist=auto longmap=ont
make[1]: Entering directory '/home/usr/input_data/nanopore/run_scaffolding/test_file/treatedM06day00_ref_Akkermansia_muciniphila_ATCC_BAA-835_scaffolds'
/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/tigmint_estimate_dist.py treatedM06day00_nanopore.trimmed.filt.fq.gz -n 1000000 -o treatedM06day00_nanopore.trimmed.filt.tigmint-long.params.tsv
sh -c '/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/../src/long-to-linked-pe -l 250 -m2000 -g2664102 -s -b treatedM06day00_nanopore.trimmed.filt.barcode-multiplicity.tsv --bx -t8 --fasta -f treatedM06day00_nanopore.trimmed.filt.tigmint-long.params.tsv treatedM06day00_nanopore.trimmed.filt.fq.gz | \
minimap2 -y -t8 -x map-ont --secondary=no ragtag.scaffold.filt.fa - | \
/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/tigmint_molecule_paf.py -q0 -s2000 -p treatedM06day00_nanopore.trimmed.filt.tigmint-long.params.tsv - | sort -k1,1 -k2,2n -k3,3n  > ragtag.scaffold.filt.treatedM06day00_nanopore.trimmed.filt.cut250.molecule.size2000.distauto.bed'
long-to-linked-pe v1.2.10: Using more than 6 threads does not scale, reverting to 6.
[M::mm_idx_gen::0.013*1.11] collected minimizers
[M::mm_idx_gen::0.018*2.62] sorted minimizers
[M::main::0.018*2.62] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.019*2.49] mid_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.020*2.42] distinct minimizers: 50087 (99.68% are singletons); average occurrences: 1.004; average spacing: 5.414
Traceback (most recent call last):
  File "/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/tigmint_molecule_paf.py", line 141, in <module>
    main()
  File "/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/tigmint_molecule_paf.py", line 138, in main
    MolecIdentifierPaf().run()
  File "/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/tigmint_molecule_paf.py", line 85, in run
    paf_entry[18]
IndexError: list index out of range
/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/tigmint-cut -p8 -w1000 -t0 -m3000 -f treatedM06day00_nanopore.trimmed.filt.tigmint-long.params.tsv -o ragtag.scaffold.filt.treatedM06day00_nanopore.trimmed.filt.cut250.molecule.size2000.distauto.trim0.window1000.spanauto.breaktigs.fa ragtag.scaffold.filt.fa ragtag.scaffold.filt.treatedM06day00_nanopore.trimmed.filt.cut250.molecule.size2000.distauto.bed
Started at: 2024-07-09 18:20:28.552021
tigmint-cut: error: calculated span parameter not found in parameter file 'treatedM06day00_nanopore.trimmed.filt.tigmint-long.params.tsv'
make[1]: *** [/home/usr/miniconda3/envs/ragtag-env/bin/share/tigmint-1.2.10-2/bin/tigmint-make:343: ragtag.scaffold.filt.treatedM06day00_nanopore.trimmed.filt.cut250.molecule.size2000.distauto.trim0.window1000.spanauto.breaktigs.fa] Error 1
make[1]: Leaving directory '/home/usr/input_data/nanopore/run_scaffolding/test_file/treatedM06day00_ref_Akkermansia_muciniphila_ATCC_BAA-835_scaffolds'
make: *** [/home/usr/miniconda3/envs/ragtag-env/bin/share/longstitch-1.0.5-0/longstitch:215: ragtag.scaffold.filt.cut250.tigmint.fa] Error 2

I think the problem is that the distauto.bed file is empty?

rw-rw-r-- 1 usr usr 1.1M Jul  9 18:02 ragtag.scaffold.agp
-rw-rw-r-- 1 usr usr  31K Jul  9 18:02 ragtag.scaffold.asm.paf
-rw-rw-r-- 1 usr usr  820 Jul  9 18:02 ragtag.scaffold.asm.paf.log
-rw-rw-r-- 1 usr usr  717 Jul  9 18:02 ragtag.scaffold.confidence.txt
-rw-rw-r-- 1 usr usr    0 Jul  9 18:02 ragtag.scaffold.err
-rw-rw-r-- 1 usr usr  95M Jul  9 18:02 ragtag.scaffold.fasta
-rw-rw-r-- 1 usr usr 271K Jul  9 18:02 ragtag.scaffold.filt.fa
-rw-rw-r-- 1 usr usr   35 Jul  9 18:07 ragtag.scaffold.filt.fa.fai
-rw-rw-r-- 1 usr usr    0 Jul  9 18:20 ragtag.scaffold.filt.treatedM06day00_nanopore.trimmed.filt.cut250.molecule.size2000.distauto.bed
-rw-rw-r-- 1 usr usr  118 Jul  9 18:02 ragtag.scaffold.stats
-rw-rw-r-- 1 usr usr 2.6M Jul  9 18:02 ref_Akkermansia_muciniphila_ATCC_BAA-835_genomic.fna
-rw-rw-r-- 1 usr usr 1.6M Jul  9 18:20 treatedM06day00_nanopore.trimmed.filt.barcode-multiplicity.tsv
-rw-rw-r-- 1 usr usr 1.4G Jul  9 18:02 treatedM06day00_nanopore.trimmed.filt.fq.gz
-rw-rw-r-- 1 usr usr   14 Jul  9 18:20 treatedM06day00_nanopore.trimmed.filt.tigmint-long.params.tsv

Any idea what is causing this? Thanks for developing LongStitch!

lcoombe commented 2 months ago

Hi @mmpust,

Thanks for reaching out! Just a few follow-up questions to help me understand what the issue is, so I can help you better.

Thank you for your interest in LongStitch! Lauren

mmpust commented 2 months ago

Hi Lauren, Thank you for the quick reply!

minimap2 --version
2.14-r883

If I run ./run_longstitch_demo.sh, I get the same IndexError error but the pipeline still continues. I attach the log file here: LOGrun.txt

I dont have the file (treatedM06day00_nanopore.trimmed.filt.tigmint-long.params.tsv) anymore. Let me reproduce.

Thanks again! Marie

lcoombe commented 2 months ago

Hi Marie,

Thanks for that information! Good to know that the demo also throws that error - that does suggest an issue with your environment itself.

As a first try, could you try updating your minimap2 version in your conda environment? Looking at the minimap2 release notes, that version is from 2018, so potentially older than what we have tested LongStitch with. Ideally, update to the most recent version (2.28), but any update 2.20+ would be good to try.

Thanks! Lauren

mmpust commented 2 months ago

This solved the problem! Thank you so much, Lauren!

lcoombe commented 2 months ago

Excellent - I'm so glad to hear that fixed it!