bcgsc / transabyss

de novo assembly of RNA-seq data using ABySS
Other
34 stars 14 forks source link

Transabyss default param setting issue #17

Closed lsterck closed 5 years ago

lsterck commented 5 years ago

Hi,

I'm running transabyss v 2.0.1 with default settings. When checking the run-logs I was a little surprised to see the following message: warning: the seed-length should be at least twice k: k=32, s=32 and indeed the default for sis set to kit says in the manual/help . Would it therefore not be better to set the default for sto k*2 by default?

thx.

sjackman commented 5 years ago

That warning message is intended for genomic assembly, to prevent assembling duplicate sequences, which is less of a concern for transcriptome assembly.

lsterck commented 5 years ago

Hi @sjackman ,

ok, fair enough, thanks. Would it make sense to change (==increase s) to avoid assembling paralogous (and/or recently duplicated) genes together?

sjackman commented 5 years ago

If that were your preference, then yes, you could increase s to decrease over-assembly of paralogous sequence.

kmnip commented 5 years ago

The s parameter was set to k intentionally (within Trans-ABySS) during the paired-end contig assembly stages because many unitigs with length equal to k are part of the correct path for assembly into transcripts. This is particularly true for highly expressed transcripts.