Sung-Huan / ANNOgesic

ANNOgesic - A Swiss army knife for the RNA-Seq based annotation of bacterial/archaeal genomes
http://annogesic.readthedocs.io/en/latest/index.html
Other
30 stars 13 forks source link

TSSpredator parameter defaults #12

Closed apredeus closed 4 years ago

apredeus commented 4 years ago

Hello Sung-Huan,

I'm trying to run some stats on predicted TSS with different TSSpredator parameters, and having trouble locating them from the logs. Right now I'm doing regular tss_ps runs without any "manual" files. I think the defaults have changed since I've updated the version (1.0.1 -> 1.10) but I'm having a bit of trouble finding the actual parameters in the files.

So, the header of gff files in /MasterTable folder has the following line:

parameters 0.3 2.0 0 0.0 2.0 0.9 3 HIGHEST 1 1

What are these numbers? Overall, I can't find in which log are the TSSpredator parameters printed...

Thank you in advance!

Sung-Huan commented 4 years ago

Hi,

I know the header of the file is hard to understand. I also spent some time to figure it our before.

You can check the user guide of TSSpredator. https://uni-tuebingen.de/index.php?eID=tx_securedownloads&p=143256&u=0&g=0&t=1581674637&hash=4a42178f845907221b05478fc6cfd6d31afe4c8d&file=/fileadmin/Uni_Tuebingen/Fakultaeten/InfoKogni/WSI/IntegTranskript/Softwareprojekte/TSSPredator/TSSpredator_Guide.pdf

In page 4-8, there are the descriptions of these parameters. Based on your example, the parameters are the following,

0.3: Step height (the same as "height" in ANNOgesic) 2.0: Step factor (the same as "factor" in ANNOgesic) 0: Step Length (it is a quite special parameter and seldom to be used. it is not included as a parameter in ANNOgesic) 0.0: Base height (the same as "base height" in ANNOgesic) 2.0: enrichment factor (the same as "enrichment factor" in ANNOgesic) 0.9: normalization percentile (default number works well in most of the cases, we did not use it as a parameter in ANNOgesic) 3: TSS cluster distance (the same as "cluster" in ANNOgesic) HIGHEST: The description is in page 8 (TSS Clustering Settings). Basically it is for cluster method. HIGHEST means only keep highest peak as the TSS. You can also use FIRST. In general, HIGHEST performs better. We did not use it as parameter in ANNOgesic. 1: Allowed cross-replicate shift. Currently, ANNOgesic only uses default setting. It may be an parameter in ANNOgesic as well in the future. 1: Matching replicates. (not exactly the same but similar as "replicate_tex" in ANNOgesic)

If you want to see your setting for TSSpredator, you can open config file which is stored in "$ANNOgesic_FOLDER/output/TSSs/configs/". However, the names of parameters are not exactly the same as the user guide...... But it is possible to figure them out.

Best,

Sung-Huan

apredeus commented 4 years ago

Thank you! Am I right to understand that if there's no replicates, three parameters (height, factor, and enrichment factor) define the calls of TSS?

I'm also struggling to understand height setting of 0.3 - the manual defines it as the number of reads starting at particular position. Is it fractional because of normalization?

Thank you

Sung-Huan commented 4 years ago

Actually, there are more parameters working together to define TSSs. Not all of them are written in gff files. What I use for optimization are height, height reduction, factor, factor reduction, base height, enrichment factor and processing factor. Since I am not the developer of TSSpredator, I do not know why they keep the names of parameters different from User Guide and output files. Height is not a integer because of the normalization. However, I am also not sure how they normalize the data. Sorry for the lack of information. If you want to know the details, perhaps you can contact TSSpredator team.

Best, Sung-Huan

apredeus commented 4 years ago

No problem - thank you very much for your help! I appreciate it.