GATB / simka

Simka and SimkaMin are comparative metagenomics methods dedicated to NGS datasets.
https://gatb.inria.fr/software/simka/
GNU Affero General Public License v3.0
45 stars 10 forks source link

Add option to trim all reads to a given length #19

Closed fplazaonate closed 3 years ago

fplazaonate commented 3 years ago

Hi,

I would like to run Simka on multiple samples with the -max-reads option to deal with various sequencing depth. However, the samples have also various read length. I guess this may slighlty bias the results as longer reads increase the total number of kmers. Would it be possible to add an option to trim all reads to a given length?

Florian

clemaitre commented 3 years ago

Hi Florian,

thank you for using Simka and for your interesting comment. I agree, it would make more sense to also trim the reads to a common size. Thus, it would be quite relevant if this was an option of Simka. However, I looked in the code, and this feature is not part of Simka's code but rather of the GATB library it is built on. I'm sorry but, at the moment, we do not have enough human resources on GATB to implement it.

Therefore, my only answer, which is not ideal, is to trim the reads beforehand as a pre-processing step with an independent tool (such as seqtk trimfq).

Regards, Claire