dib-lab / genome-grist

map Illumina metagenomes to genomes!
https://dib-lab.github.io/genome-grist/
Other
36 stars 6 forks source link

Generic StreamReadError error #190

Open hehouts opened 2 years ago

hehouts commented 2 years ago

I'm trying to run grist on one sample, using starting with SRA accession numbers (not trying to use local genomes). Im getting a "OSError: Generic StreamReadError error" during rule kmer_trim_reads_wc

config looks like

samples:
#- SRR3051889
- SRR3051890
#- SRR3051891
#- SRR3051892
#- SRR3051893

outdir: outputs.sra5smpl
metagenome_trim_memory: 64e9
prefetch_memory: 64e9

sourmash_compute_ksizes:
- 21
- 31
- 51

sourmash_databases:
- /group/ctbrowngrp/gtdb/databases/gtdb-rs202.genomic-reps.k31.zip

taxonomies:
- /group/ctbrowngrp/genbank/all_genbank_lineages.20200727.csv

I requested 66G of memory, and max mem used was only ~ 40 gb. not having enough memory was my best guess at the problem. Not sure what to try next.

hehouts commented 2 years ago

this is with genome-grist version v0.8.4.dev6+g7590338 and the most .err file is here: /home/hehouts/dynamic-duos-virome/grist/sra5smpl/jobs/sra5.j50895726.err

I tried running SRR3051889 and SRR3051890, and they both had this error

ctb commented 2 years ago

hi @hehouts yay cool new errors 😆

It looks to me like the error is in trim-low-abund,

  File "/home/hehouts/dynamic-duos-virome/grist/sra5smpl/.snakemake/conda/9f06064bd6e082e98e81386024b7b271/bin/trim-low-abund.py", line 212, in pass1
    for n, is_pair, read1, read2 in reader:
  File "/home/hehouts/dynamic-duos-virome/grist/sra5smpl/.snakemake/conda/9f06064bd6e082e98e81386024b7b271/lib/python3.8/site-packages/khmer/utils.py", line 81, in broken_paired_reader
    for record in screed_iter:
OSError: Generic StreamReadError error

and it may be in reading from the file.

Can you download one of these files yourself (using SRA toolkit) and take a look at the sequences?

ccbaumler commented 1 year ago

Just ran into this error myself on another workflow. Any further documentation on this issue?

ccbaumler commented 1 year ago

Update: My OSError was due to faulty paired-end sequence metadata. Since interleaved-reads.py was piping into trim-low-abund.py the interleaved-reads.py error was masked.

See here