How to set -d and --tolerance during grouping step

COMBINE-lab / terminus

BSD 3-Clause "New" or "Revised" License

57 stars 4 forks source link

How to set -d and --tolerance during grouping step #5

Closed diriano closed 1 year ago

diriano commented 4 years ago

Do you have any recommendation ot set the parametes -d and --tolerance? I am working from sugarcane (polyploid) RNASeq with several conditions (2 genotypes X 2 substrate conditions X 4 leaf parts X 3 replicates) Thanks

hiraksarkar commented 4 years ago

Hi @diriano

Thanks for using Terminus. Let me try to answer the doubts in parts. The --tolerance option suggests that when two transcripts within an equivalence class have a similarish score in terms of alignment probabilities, how sensitive the terminus algorithm be in regarding them as equally probable. For example, considering an equivalence class with two transcripts with probabilities 0.499.. and 0.501.. would be considered as having equal probabilities under a tolerance score of 0.01. In general, it will determine if you want Terminus to be too sensitive to the alignment probability differences. In our experience, the value of 0.01 works well.

-d option during group specify what are the input folders where the salmon output should be. In short, a good way to check would be to look for quant.sf in that directory. Hope that helps.

diriano commented 4 years ago

Hi @hiraksarkar,

thanks for your reply. I meant --min-spread instead of -d. Could you please comment on choosing --min-spread? Thanks

hiraksarkar commented 4 years ago

Hi @diriano ,

Sure, --min-spread is a metric that sets the threshold for the initial filtering of the transcripts, which does not have a large fluctuation/variation in their posterior estimate. In short, it is defined by, (maximum - minimum) / mean. We used 0.05 as the minimum spread, if you use a higher value that means, Terminus will only consider transcripts with a larger width in the posterior distribution. In other words, it will consider less number of transcripts for grouping.

vragh commented 4 years ago

Hi @diriano ,

Sure, --min-spread is a metric that sets the threshold for the initial filtering of the transcripts, which does not have a large fluctuation/variation in their posterior estimate. In short, it is defined by, (maximum - minimum) / mean. We used 0.05 as the minimum spread, if you use a higher value that means, Terminus will only consider transcripts with a larger width in the posterior distribution. In other words, it will consider less number of transcripts for grouping.

@hiraksarkar is that 0.05 being suggested as the default value, or as the lowest you'd recommend going with --min-spread? I'm asking because terminus 0.1.0 has that parameter set to 0.1 by default.

rob-p commented 4 years ago

@hiraksarkar, we should set the document this parameter better and make sure the default matches our general recommendations (while acknowledging its a free parameter, and so users may have to explore a bit with their data).

hiraksarkar commented 4 years ago

@vragh Apologies for the confusion. The default value is still set to 0.1 instead of 0.05. To correct it there we need to push a new version to Conda. Until that is done please use 0.05, and as @rob-p mentioned exploring the data with different values of min-spread is also a good idea, thanks for pointing it out.

@rob-p I guess, we need to push this correction to conda version 0.1.1, thanks for the heads up. We did not speak about this option in our tutorial, I will send a correction there. Thanks for the heads up.