Arcadia-Science / peptigate

Peptigate ("peptide" + "investigate") predicts bioactive peptides from transcriptome assemblies or sets of proteins.
MIT License
0 stars 0 forks source link

Consider requiring one file as input for the transcriptome assembly instead of splitting between short and long transcripts #42

Closed taylorreiter closed 3 months ago

taylorreiter commented 4 months ago

I initially wrote peptigate to be maximally compatible with reads2transcriptome. r2t outputs two files for its transcriptome assembly:

  1. The actual transcriptome assembly
  2. Very short contigs that are discarded early on. The user can control this setting in r2t, we were setting it at 75bp. Most people would probably tell their assemblers to not output contigs below 300 or 500bp.

Because of this, I was thinking it would make it easier to run peptigate on r2t outputs by accepting two files so one would not need a concatenation preprocessing step in between to be able to screen really short transcripts for ORFs. but, r2t has ended up not being awesome for a variety of reasons and we aren't really using it.

This means we end up with two input files where one would probably suffice -- the first step of the pipeline that deals with these files combines them anyway. This decision and the required input files are really hard to document, and it ends up not mattering for the pipeline. So, we should consider only having one input file. Things that would need to be updated: