I initially wrote peptigate to be maximally compatible with reads2transcriptome. r2t outputs two files for its transcriptome assembly:
The actual transcriptome assembly
Very short contigs that are discarded early on. The user can control this setting in r2t, we were setting it at 75bp. Most people would probably tell their assemblers to not output contigs below 300 or 500bp.
Because of this, I was thinking it would make it easier to run peptigate on r2t outputs by accepting two files so one would not need a concatenation preprocessing step in between to be able to screen really short transcripts for ORFs. but, r2t has ended up not being awesome for a variety of reasons and we aren't really using it.
This means we end up with two input files where one would probably suffice -- the first step of the pipeline that deals with these files combines them anyway. This decision and the required input files are really hard to document, and it ends up not mattering for the pipeline. So, we should consider only having one input file. Things that would need to be updated:
[x] Snakefile: remove concat rule, remove references to variable for short contigs file
[x] config: remove short contigs file path from config
[x] demo: remove short contigs file
[x] readme: update documentation about input files and configs
I initially wrote peptigate to be maximally compatible with reads2transcriptome. r2t outputs two files for its transcriptome assembly:
Because of this, I was thinking it would make it easier to run peptigate on r2t outputs by accepting two files so one would not need a concatenation preprocessing step in between to be able to screen really short transcripts for ORFs. but, r2t has ended up not being awesome for a variety of reasons and we aren't really using it.
This means we end up with two input files where one would probably suffice -- the first step of the pipeline that deals with these files combines them anyway. This decision and the required input files are really hard to document, and it ends up not mattering for the pipeline. So, we should consider only having one input file. Things that would need to be updated: