Question: Determining Required Percentage Similarity and Handling Database Input

LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers

https://looselab.github.io/readfish/

GNU General Public License v3.0

163 stars 31 forks source link

Question: Determining Required Percentage Similarity and Handling Database Input #321

Closed ahfitzpa closed 5 months ago

ahfitzpa commented 7 months ago

I am planning a virus sequencing project using Readfish. Considering the ONT error rate and the adaptive sampling system, what is the necessary percentage similarity between the reference sequence (database) and the target sequence (expected on your flow cell). Given the diversity of viruses, I would like to avoid an unwieldy mmi input file and also avoid false hits.

github-actions[bot] commented 7 months ago

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

Adoni5 commented 7 months ago

I think you could get away with something in the range of 70% based on some work I was doing today.

How many virus's are you looking at in the sample? You could probably get away with using a generic reference for a group of species, but that said if you are looking to differentiate between two very similar species, it might require more thought. Any thoughts @mattloose ?

ahfitzpa commented 6 months ago

I will not know how many viruses are in the samples as it is an virus discovery project in a wide variety of samples types, therefore I cannot take a host depletion approach. What I am hoping and will test from what you are saying is that AS via ReadFish is pretty permissive, so I can reduce the size of my db by clustering to a specific similarity. I will have fun at the other end of sequencing disentangling similar species anyway due to the ONT error rate, though it is much improved. The size limits are pretty well documented are AS. Do you think that increasing the time a sequence spends in the pore would permit AS of shorter sequences?

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 5 months ago

This issue was closed because there has been no response for 5 days after becoming stale.