arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
328 stars 76 forks source link

resolve the hang problem #226 #254

Closed hunglin59638 closed 11 months ago

hunglin59638 commented 11 months ago

Replacing multiprocessing with concurrent.futures to resolve the hang Preloading output_tab_sequences into dictionary to improve program efficiency

Testing with SRA fastq: DRR387644 It's illummia 150bp paired-end reads from Klebsiella pneumoniae File size is 541M.

fasterq-dump -O fq DRR387644
time rgi bwt -1 fq/DRR387644_1.fastq -2 fq/DRR387644_2.fastq -a kma -o bwt_out --include_wildcard --include_other_models
real    4m33.099s
user    6m31.818s
sys     0m50.640s

It took about 4.5 minutes to finish. The current codes (778b83d) spent more than 1 hr, so I didn't wait it.

raphenya commented 11 months ago

@hunglin59638 can you account for the missing intermediate files i.e FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/rgi/rgi/tests/outputs/output_bwt_kma_interleaved.seqs.temp.txt'. I will review the code after all the tests pass. Cheers.