edraizen / HistoneDB

Browse all histone sequences by histone varaints
http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0
0 stars 2 forks source link

"Analyze sequences" #134

Open anna-panchenko opened 8 years ago

anna-panchenko commented 8 years ago

Please remove this huge defline in an example sequence. what does it mean: "Max 1 Sequence in FASTA format."? I see "Sequence:" and "File:" on the left edge on the page - what do these mean?

leonardomarino commented 8 years ago

Would it be possible to have a clear button so the sequence box can be cleared. Additionally the results could display a result for the highest ranking result. I am attaching the results for Homo sapiens CenH3 as you can see the results state that the variant is unknown but looking at the scores it is clear that is an H3 and the highest score is with CenH3. Would it be possible to have this page give an educated guess? histone_variants.txt screen shot 2015-09-29 at 10 57 34 am

molsim commented 8 years ago

Sure, we will make a "clear" button and I think we will rework the design of this page anyway. As for the results page, I think we will add explanations, that the upper green box is the result of HMM classifier, and ask the user to focus attention on the BLAST table below in case HMM classifier fails - which makes it clear that the variant is CenH3.

The problem with HMM classifier for CenH3 is that CenH3 are rather diverse (not monophyletic). There are canonical H3 that score more with cenH3 model than some true cenH3. So we keep the classification threshold high to avoid wrong guesses.

molsim commented 8 years ago

Ok, I had a close look at the CenH3, here is how we can improve the classifier - just report both: 1) If the sequence satisfied our robust criterion 2) the model with maximum HMM score

leonardomarino commented 8 years ago

In this case how about saying H3, unknown variant. But at least we should be able to classify at the histone level.

Leonardo Mariño-Ramírez marino@marino-johnson.org

---- Alexey Shaytan wrote ----

Sure, we will make a "clear" button and I think we will rework the design of this page anyway. As for the results page, I think we will add explanations, that the upper green box is the result of HMM classifier, and ask the user to focus attention on the BLAST table below in case HMM classifier fails - which makes it clear that the variant is CenH3. The problem with HMM classifier for CenH3 is that CenH3 are rather diverse (not monophyletic), so the decision thresholds that give acceptable TP/FP ration are high. In the case of homo CenH3 simply taking the highest score works, but it might not wok in other cases. So we prefer to report unknown, rather than report a wrong variant.

— Reply to this email directly or view it on GitHub.

molsim commented 8 years ago

I will see what can we do. I'd better refer the user to our blast results table, it seems more straightfroward to me.

To classify the variant as H3 using HMMs, we will need a combined model for all H3. But since cenH3 is sufficiently divergent from other H3s, the model might again have problems in picking cenH3.