Closed ahfitzpa closed 5 months ago
I think you could get away with something in the range of 70% based on some work I was doing today.
How many virus's are you looking at in the sample? You could probably get away with using a generic reference for a group of species, but that said if you are looking to differentiate between two very similar species, it might require more thought. Any thoughts @mattloose ?
I will not know how many viruses are in the samples as it is an virus discovery project in a wide variety of samples types, therefore I cannot take a host depletion approach. What I am hoping and will test from what you are saying is that AS via ReadFish is pretty permissive, so I can reduce the size of my db by clustering to a specific similarity. I will have fun at the other end of sequencing disentangling similar species anyway due to the ONT error rate, though it is much improved. The size limits are pretty well documented are AS. Do you think that increasing the time a sequence spends in the pore would permit AS of shorter sequences?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because there has been no response for 5 days after becoming stale.
I am planning a virus sequencing project using Readfish. Considering the ONT error rate and the adaptive sampling system, what is the necessary percentage similarity between the reference sequence (database) and the target sequence (expected on your flow cell). Given the diversity of viruses, I would like to avoid an unwieldy mmi input file and also avoid false hits.