Open franztastic opened 3 weeks ago
Hi,
Were you planning to use on assembled scaffolds?
Are you able to share a bit more information about data? Are they long read metagenomics?
Hi, Yes, on assembled scaffolds but in a different aim, whole genome assembly not metagenomics. They are long HIFI reads from 20 individuals of a small species of arthropoda and I'm trying to have the WGA with HIC reads as well.
Thanks!
HIC reads may not be usable. But there is a chance that binning can be used to separate these species.
Firstly, you need to find some evidence to support the statement; "similar to metagenomics, intra-species oligo nucleotide frequencies are similar while inter-species frequencies are different".
You might like this tool to do that screening first to confirm the hypothesis.
https://github.com/anuradhawick/kmertools
You could see an example in its wiki - https://github.com/anuradhawick/kmertools/wiki/Oligo-nucleotide-computations#example-application the diagram that shows the difference.
Few remarks, LRBinner right away may not be applicable due to the assumption of having approximately a million long reads to work best with coverages from 10X to 100X. But if you have these estimates I am very happy to help.
Let me know
Oh god, my previous messages were not clear at all.. Our data comes from 20 different individuals of the same species and we want to have a whole genome assembly. As the individuals are really tiny we are sure that there are a lot of contaminants there so we thought of using LRBinner to remove that contaminants and work only with our species bin. Our long reads are PacBio hifi reads, about 4million reads and we assume a coverage of about 50x. We've tried with two different approaches.
We assume that LRBinner is not made for decontamination however I'm not sure I understand why I have such these differences.
Sorry for the inconsistency of my prior messages... And thank you very much for providing these tools, I'll check too this other one you mention and study a bit further my species.
Ok I got it now. Sorry.
It's hard to give a straight answer. Because in binning we expect good clusters. In contamination the contamination may or may not be a distinct cluster.
But I guess there might be some luck because contamination has the natural tendency of having very low coverage.
Hello everyone,
I've used your tool in different metagenomics project and now, as suggested by my supervisor in a WGA, I've used as a decontamination tool after scaffolding.
Following, I've tried doing manual curation with Pretext and it seems that I have no contacts between my scaffolds and 33 different chromosomes.
However, I've tested running LRBinner and, later, YAHS and my results are completely different, having now 17 chromosomes with a lot of contact between my scaffolds but with a super-low coverage.
I see that the tool is not made to be used for decontamination after scaffolding but I'm wondering why results are that different.
Thank you very much for your answer!