Closed mmiladi closed 7 years ago
Actually there are two wrappers to check, searching with CM and collect result
@mmiladi yeah here the threshold is very high, but in Collect results step, when we define clusters there we also have a parameter for E-value which is, if i'm not wrong, 0.01. So cmsearch reports more sequences and then in Collect Results step we take only ones that satisfy our E-value
So i don't really understand what's the problem, i mean what should i do?should i change the default value of E-val in cmsearch for GraphClust? to smtg lower ?
I see now. The problem is with the default setting CMsearch uses E-value and returns too much low quality matches(Eval<10) and the Collect-results uses CM-bitscore. In my tests the combination doesn't work well. I think it's time to deviate Galaxy pipeline default param values from the perl-pipeline defaults. I collect a list of settings to change in default mode.
ok, the question is should i change this values in tools, or just update workflows? and also in perl-pipeline Results-top-num by default is 5, no? and what to put for window-size and shift? cz again i took this values from original pipe-line.
It's not urgent and the list may not be complete. We can discuss next week how best would be to do it.
Cmsearch step has two types of thresholds: Reporting & Inclusion The reporting threshold is so high by default (E-val 10) and should not be changed if it used as the output for the next step. @eteriSokhoyan Pls have look at this.