arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
330 stars 78 forks source link

RGI bwt: paired end VS single end. Which one is better? #175

Closed valei closed 11 months ago

valei commented 2 years ago

Hi, I ran RGI bwt in a metagenome sample in two different ways: 1) using as input R1 and R2 paired-end files; 2) using as input a concatenated file R1+R2 and so considering the sample as single-end. The results at both allele and gene level are fully different for a large number of genes/alleles found (about 30/40%). I wonder, why is there such a big difference? What is the most reilable way to run the tool?

raphenya commented 1 year ago

@valei, which commands did you use to concatenate the files?

github-actions[bot] commented 1 year ago

Issue is stale and will be closed in 7 days unless there is new activity

valei commented 1 year ago

I used simply cat command: cat sample_R1.fastq sample_R2.fastq > sample.fastq

raphenya commented 1 year ago

@valei the cat command appends the reads. If you run rgi bwt with -1 and -2 i.e paired reads option and run again with either R1 or R2 with only -1, you should get the same results. Look for current methods on how to effectively merge the R1 and R2 into single end reads.

raphenya commented 11 months ago

@valei see https://drive5.com/usearch/manual9.2/merge_pair.html and https://drive5.com/usearch/manual9.2/cmd_fastq_mergepairs.html

raphenya commented 11 months ago

@valei I did some analysis you can see at https://github.com/raphenya/read-merging. Long story short merging with cat produces lower hit rate.

@agmcarthur note.

Cheers.