artic-network / rampart

Read Assignment, Mapping, and Phylogenetic Analysis in Real Time
GNU General Public License v3.0
79 stars 33 forks source link

Seqkit backend and other improvements #82

Closed bsipos closed 3 years ago

bsipos commented 4 years ago

Dear rampart developers,

Karhide and I did some tweaks to the backend and the GUI adding the following features:

Please merge if you think it is useful and let us know if you have ideas for improvement. Best, Botond

PS: We are contributing this code as individual developers and not as employees of Oxford Nanopore Technologies.

rambaut commented 3 years ago

Dear Botond, this looks great and we will need some time to look at this. One initial thought is I am against the use of 'accuracy' rather than 'identity' because the aim of this is to be able to assign reads to different genotypes on the basis of similarity of nucleotides. For most viruses at this level, indels are quite rare and are more likely to be sequencing errors.

bsipos commented 3 years ago

@rambaut If you wanted to use the number of matches instead of accuracy the Dump tool from the seqkit bam -T functionality can dump it to TSV along with the rest of the information needed. I would argue though, that the best way to assign the reads is the alignment score as it is a proxy for the likelihood of the read belonging to a particular reference. While the accuracy is essentially a length normalized alignment score.

On the test dataset using the accuracy gave me essentially the same results as the original approach, so it might not have a large impact in practice.