BackofenLab / GraphClust-2

A pipeline for structural clustering of RNA secondary structures
GNU General Public License v3.0
14 stars 4 forks source link

Cmsearch output threshold #30

Closed mmiladi closed 7 years ago

mmiladi commented 7 years ago

Cmsearch step has two types of thresholds: Reporting & Inclusion The reporting threshold is so high by default (E-val 10) and should not be changed if it used as the output for the next step. @eteriSokhoyan Pls have look at this.

mmiladi commented 7 years ago

Actually there are two wrappers to check, searching with CM and collect result

eteriSokhoyan commented 7 years ago

@mmiladi yeah here the threshold is very high, but in Collect results step, when we define clusters there we also have a parameter for E-value which is, if i'm not wrong, 0.01. So cmsearch reports more sequences and then in Collect Results step we take only ones that satisfy our E-value

eteriSokhoyan commented 7 years ago

So i don't really understand what's the problem, i mean what should i do?should i change the default value of E-val in cmsearch for GraphClust? to smtg lower ?

mmiladi commented 7 years ago

I see now. The problem is with the default setting CMsearch uses E-value and returns too much low quality matches(Eval<10) and the Collect-results uses CM-bitscore. In my tests the combination doesn't work well. I think it's time to deviate Galaxy pipeline default param values from the perl-pipeline defaults. I collect a list of settings to change in default mode.

mmiladi commented 7 years ago
eteriSokhoyan commented 7 years ago

ok, the question is should i change this values in tools, or just update workflows? and also in perl-pipeline Results-top-num by default is 5, no? and what to put for window-size and shift? cz again i took this values from original pipe-line.

mmiladi commented 7 years ago

It's not urgent and the list may not be complete. We can discuss next week how best would be to do it.