bluenote-1577 / fairy

alignment-free coverage calculation for metagenomic binning >100 times faster
MIT License
38 stars 1 forks source link

[Q] is fairy concerned by sketching deserts ? #7

Closed Louis-MG closed 1 week ago

Louis-MG commented 1 week ago

Hello ! Thank for the work you've done, it i going to save a lot of time !

I recently read 'k-nonical space: sketching with reverse complement' Oxford bioinformatics. My understanding is that region of genomes may be ignored by context-free sketching methods, and that hash-based methods are context-free. Fairy is based on FracMinHash, which uses hash, therefore is a context free method.

My question is: it seems that fairy might be subject to that problem, do you agree or is there a nuance of difference that prevents sketching deserts ? The results of fairy are very promising but I would like to know about this potential caveat !

bluenote-1577 commented 1 week ago

hi @Louis-MG

this is an interesting question. i haven't done testing on sketching deserts. in practice, I found that for larger contigs, fracminhash is sufficient... perhaps a better k-mer sketching algorithm would improve fairy on smaller contigs.

I'll close this issue since this is the subject of a larger investigation. Thanks for bringing this up though!