BinPro / CONCOCT

Clustering cONtigs with COverage and ComposiTion
Other
122 stars 48 forks source link

Coverage calculation from kallisto #273

Closed adityabandla closed 5 years ago

adityabandla commented 5 years ago

I am trying to use Kallisto to estimate coverage of my assembled contigs (paired-end library). I had a look at the script here https://github.com/EnvGen/toolbox/blob/master/scripts/kallisto_concoct/input_table.py

Standard division of estimated read counts by the contig length, is then multiplied by 200. What exactly is 200 here, although it shouldn't matter what the constant is

alneberg commented 5 years ago

Hi @adityabandla,

good that you found your way to that script! Please have a look at the calculation and make sure that you agree with the rest of the calculation as well. The number 200 is supposed to correspond to the total read length so that it would correspond to coverage calculated the traditional way. Honestly though, I've used it with several different read lengths.

Good luck, Johannes

adityabandla commented 5 years ago

Thanks @alneberg It makes sense now, but I am assuming that this is the case when the two paired-end reads do not overlap

alneberg commented 5 years ago

Yes, you are right. This is only an approximation as we usually trim the reads beforehand as well. But as you say, the value of the constant is not very important. I've mostly used it to have the scatter plot between kallisto quantification and regular coverage quantification in the same scale.

adityabandla commented 5 years ago

Thanks a ton!