DessimozLab / read2tree

a tool for inferring species tree from sequencing reads
MIT License
138 stars 18 forks source link

[Question] Are very short insert sizes an issue? #37

Closed mptrsen closed 8 months ago

mptrsen commented 11 months ago

I'm going to process a bunch of sequencing datasets from museum samples, some stored under horrendous conditions and for decades. As a result, the DNA fragments are very short, often shorter than a read length. So we will have many read pairs that overlap partly or fully.

FL_plots

Is this an issue for the local assembly/will read2tree work with such samples?

alpae commented 11 months ago

Dear @mptrsen

we don't have experience with such degenerated samples, but I suggest to just give it a try. In our experience, the coverage needed is very low, so I would hope that read2tree can deal with this.

mptrsen commented 11 months ago

Yes, coverage won't be an issue, just the fragmentation. I understand from the read2tree paper that the "local assembly" step consists in taking the consensus sequence from the mapped reads. So the insert size should be irrelevant to the method, because there is no "traditional" assembler in the pipeline that would use the insert size metric somehow. Is that correct?

sinamajidian commented 10 months ago

Thanks. We also have a step to map the sequencing reads onto the orthologous groups using the nextgenmap software for short reads. You can check its wiki. We could provide you with some adjustment by changing some parts of our code to benefit from NGM arugments.

mptrsen commented 10 months ago

@sinamajidian Thanks for the info. Since read2tree uses neither the -I/ --min-insert-size nor the -X/ --max-insert-size option for NGM, I assume that it runs with the defaults, i.e. insert size between 0 and 1000 bp.

Out of interest: Why use nextgenmap instead of a different mapper such as BWA, Bowtie, or bbmap? (There was little detail about the methods in the paper.)

sinamajidian commented 10 months ago

You're welcome.

The project started a few years ago, and NGM was among the best at the time. Besides, Fritz in the read2tree team was among the NGM's developers.

Best regards, Sina