Open anna-panchenko opened 8 years ago
Would it be possible to have a clear button so the sequence box can be cleared. Additionally the results could display a result for the highest ranking result. I am attaching the results for Homo sapiens CenH3 as you can see the results state that the variant is unknown but looking at the scores it is clear that is an H3 and the highest score is with CenH3. Would it be possible to have this page give an educated guess?
histone_variants.txt
Sure, we will make a "clear" button and I think we will rework the design of this page anyway. As for the results page, I think we will add explanations, that the upper green box is the result of HMM classifier, and ask the user to focus attention on the BLAST table below in case HMM classifier fails - which makes it clear that the variant is CenH3.
The problem with HMM classifier for CenH3 is that CenH3 are rather diverse (not monophyletic). There are canonical H3 that score more with cenH3 model than some true cenH3. So we keep the classification threshold high to avoid wrong guesses.
Ok, I had a close look at the CenH3, here is how we can improve the classifier - just report both: 1) If the sequence satisfied our robust criterion 2) the model with maximum HMM score
In this case how about saying H3, unknown variant. But at least we should be able to classify at the histone level.
Leonardo Mariño-Ramírez marino@marino-johnson.org
---- Alexey Shaytan wrote ----
Sure, we will make a "clear" button and I think we will rework the design of this page anyway. As for the results page, I think we will add explanations, that the upper green box is the result of HMM classifier, and ask the user to focus attention on the BLAST table below in case HMM classifier fails - which makes it clear that the variant is CenH3. The problem with HMM classifier for CenH3 is that CenH3 are rather diverse (not monophyletic), so the decision thresholds that give acceptable TP/FP ration are high. In the case of homo CenH3 simply taking the highest score works, but it might not wok in other cases. So we prefer to report unknown, rather than report a wrong variant.
— Reply to this email directly or view it on GitHub.
I will see what can we do. I'd better refer the user to our blast results table, it seems more straightfroward to me.
To classify the variant as H3 using HMMs, we will need a combined model for all H3. But since cenH3 is sufficiently divergent from other H3s, the model might again have problems in picking cenH3.
Please remove this huge defline in an example sequence. what does it mean: "Max 1 Sequence in FASTA format."? I see "Sequence:" and "File:" on the left edge on the page - what do these mean?