At present, all sequences in the reference database are used if they are among the best hits, irrespective of the resolution of their taxon. Some are assigned to a species level, others to a higher level.
This can reduce the taxonomic resolution: For example if we have 2 hits at 97% identity, where 1 reference sequence is identified to the species, but the other only to the family, the variant will be assigned to the family.
I suggest that the users should be able to set the minimum resolution of the reference sequences for each %identity.
It can be something like this
100% species
97% genus
95% family
90% order
85% class
80% phylum
I have already made a taxonomy file with an additional column that contains the resolution index:
8: species
7: genus
6 : family
5 : order
4 : class
3 : phylum
2 : kingdom
1 : superkingdom
For other levels the index is a non-integer. e.g. 7.5 for subgenus.
This simplifies greatly the selection of the reference sequences.
At present, all sequences in the reference database are used if they are among the best hits, irrespective of the resolution of their taxon. Some are assigned to a species level, others to a higher level. This can reduce the taxonomic resolution: For example if we have 2 hits at 97% identity, where 1 reference sequence is identified to the species, but the other only to the family, the variant will be assigned to the family.
I suggest that the users should be able to set the minimum resolution of the reference sequences for each %identity. It can be something like this 100% species 97% genus 95% family 90% order 85% class 80% phylum
I have already made a taxonomy file with an additional column that contains the resolution index: 8: species 7: genus 6 : family 5 : order 4 : class 3 : phylum 2 : kingdom 1 : superkingdom For other levels the index is a non-integer. e.g. 7.5 for subgenus. This simplifies greatly the selection of the reference sequences.