NationalGenomicsInfrastructure / anglerfish

Anglerfish - Nanopore reads from Illumina libraries
MIT License
2 stars 4 forks source link

Samplesheet-free mode #57

Open remiolsen opened 10 months ago

remiolsen commented 10 months ago

A separate or parallel step to the main anglerfish algorithm that tries to classify reads not based on the adaptors and index setups found in the input samplesheet, but tries to cluster reads based on all known adaptors and setups. This would be useful especially for cases where we have "unknown unknowns" / unicorns in our sequencing pools.

remiolsen commented 10 months ago

Part of this will mode will rely on #59 to determine which adaptor template(s) to map to. Anglerfish CLI will have to be modified to run without input samplesheet. And more importantly it needs to cluster the index hits (unmatched indices in current anglerfish terminology) with some allowance for edit distance. And finally there needs to be some measure of confidence called for each cluster. We will have to put on our statistician hats 🙀