arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
319 stars 76 forks source link

fastq read limit for rgi bwt? #202

Closed annettemcgrath closed 10 months ago

annettemcgrath commented 1 year ago

Hi, I have been trying to run rgi bwt on trimmed metagenomic reads. The input paired end illumina read files are big: I have been using files containing from 65M reads to 122M reads each. I also have files with >300M reads to process.

Analysis proceeds through the sam, bam file creation quickly but hangs once bwa.model_species_data_type.temp.txt bwa.reference_mapping_stats.txt bwa.coverage_all_positions.summary.temp.txt bwa.coverage_all_positions.temp.txt files are created until the script times out. This happens for both kma and bwa. I have not tried bowtie.

I have used subsets of the data - 50M, 20M and 10M paired reads - and get the same behaviour, again with both kma and bwa.

So far the only read set that has successfully completed is a 1M paired read subset.

I am using RGI 6.0.1 with a local database, including wildcards.

Thanks for your help

raphenya commented 1 year ago

@annettemcgrath Is it possible to send me the 10M paired reads so that I can test them? Please also send me the commands you are using. Cheers.

github-actions[bot] commented 10 months ago

Issue is stale and will be closed in 7 days unless there is new activity