Open xinehc opened 1 year ago
Hi,
I noticed that sometime the results is not reproducible using the long read mode, I wonder is it related to the internal seed or tie breaking?
Here is a minimal example: diamond blastx \ --db protein.txt \ --query query.txt \ --outfmt 6 qseqid sseqid pident length qlen qstart qend slen sstart send evalue bitscore \ --range-culling -F 15 \ --max-target-seqs 25 --max-hsps 0 \ --threads 16
First run:
JAHHUZ010000004.1 WP_291717584.1 89.9 119 158376 144690 145046 119 1 119 1.31e-68 216 JAHHUZ010000004.1 WP_106008235.1 89.1 119 158376 13687 13331 119 1 119 1.78e-68 216
Second run:
JAHHUZ010000004.1 WP_291717584.1 89.9 119 158376 13687 13331 119 1 119 1.31e-68 216 JAHHUZ010000004.1 WP_106008235.1 89.1 119 158376 13687 13331 119 1 119 1.78e-68 216
Third run:
JAHHUZ010000004.1 WP_291717584.1 89.9 119 158376 144690 145046 119 1 119 1.31e-68 216 JAHHUZ010000004.1 WP_106008235.1 89.1 119 158376 144690 145046 119 1 119 1.78e-68 216
protein.txt query.txt
Update: this seem to be a corner case, the reverse complement of the sequence is the same as the original sequence:
seqkit grep -p 'JAHHUZ010000004.1' query.txt > contig.txt seqtk seq contig.txt > forward.txt seqtk seq -r contig.txt > reverse.txt md5sum forward.txt reverse.txt 89c390f039087b7265e794b35912e0ba forward.txt 89c390f039087b7265e794b35912e0ba reverse.txt
There seems to be a non-determinism in reporting equally scoring hits. I will fix this when I get the chance.
Hi,
I noticed that sometime the results is not reproducible using the long read mode, I wonder is it related to the internal seed or tie breaking?
Here is a minimal example: diamond blastx \ --db protein.txt \ --query query.txt \ --outfmt 6 qseqid sseqid pident length qlen qstart qend slen sstart send evalue bitscore \ --range-culling -F 15 \ --max-target-seqs 25 --max-hsps 0 \ --threads 16
First run:
Second run:
Third run:
protein.txt query.txt
Update: this seem to be a corner case, the reverse complement of the sequence is the same as the original sequence: