bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
992 stars 182 forks source link

Inconsistent taxonomy assignment results for the same sequences #788

Open XHe20 opened 4 months ago

XHe20 commented 4 months ago

I used Diamond and MEGAN to assign taxonomy for my contigs.

diamond blastx -d nrdb.dmnd -q final.contigs.part_001.fa \
-o final.contigs_graham_01.daa -F 15 --range-culling -f 100 \
-t ./ --threads 32 --fast --max-target-seqs 100

daa-meganizer -i final.contigs_graham_01.daa \
-mdb megan-map-Feb2022.db --longReads

I exported taxonomy information at the Class level using MEGAN, and there were 1306 contigs assigned to Mammalia. I used those 1306 sequences to re-run above scripts and only 97.6% of the 1306 sequences were assigned to Mammalia. This is not expected as I expected 100% of the 1306 sequences were assigned to Mammalia.

Then, I set --masking 0 and run the analyses again.

diamond blastx -d nrdb.dmnd -q final.contigs.part_001.fa \
-o final.contigs_graham_01_2.daa -F 15 --range-culling -f 100 \
-t ./ --threads 32 --fast --max-target-seqs 100 --masking 0

daa-meganizer -i final.contigs_graham_01_2.daa \
-mdb megan-map-Feb2022.db --longReads

I used the contigs assigned to Mammalia to re-run the above scripts, only 82.2% sequences were assigned to Mammalia.

I am wondering what caused the inconsistency and what parameters can be used to increase the consistency for the results from different runs.

bbuchfink commented 3 months ago

I'm not really sure what's happening here you would have to look at all the alignments in detail.