arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
314 stars 75 forks source link

resolve the hang problem #226 #253

Closed hunglin59638 closed 9 months ago

hunglin59638 commented 9 months ago

Replacing multiprocessing with concurrent.futures to resolve the hang Preloading output_tab_sequences into dictionary to improve program efficiency

Testing with SRA fastq: DRR387644 It's illummia 150bp paired-end reads from Klebsiella pneumoniae File size is 541M.

fasterq-dump -O fq DRR387644
time rgi bwt -1 fq/DRR387644_1.fastq -2 fq/DRR387644_2.fastq -a kma -o bwt_out --include_wildcard --include_other_models
real    4m33.099s
user    6m31.818s
sys     0m50.640s

It took about 4.5 minutes to finish. The current codes (778b83d) spent more than 1 hr, so I didn't wait it.