edraizen / HistoneDB

Browse all histone sequences by histone varaints
http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0
0 stars 2 forks source link

Multiple variants and H2A.X/Z #101

Closed molsim closed 9 years ago

molsim commented 9 years ago

After trying to fix H2As - two suggestions: 1) Add variant assignment by SQEF motif search and override hmm if found. 2) allow multiple variants point to one sequence for curated. Is this feasible?

edraizen commented 9 years ago

Have you compared the scores for sequences that exhibit both? I think that H2A.X comes up as above the threshold, but not classified.

If we used the presence of the H2A.X motif I think we would have rethink our whole classification algorithm. Also, the HMM is doing this already to an extent. I'll look soon.

edraizen commented 9 years ago

Here is the logo of the H2A.X HMM. The H2A.X motif does have the most information content.

9c329f10-4c17-11e5-8dff-f719cf72a94a

edraizen commented 9 years ago

Ok, so one problem when adding the regex, we only have one sequence from canonicalH2A:

In [1]: Sequence.objects.filter(variant="canonicalH2A", sequence__regex="SQ[ED][YFLI]$")
Out[1]:
[<Sequence: >398366187|saccharomyces|canonicalH2A
MSGGKGGKAGSAAKASQSRSAKAGLTFPVGRVHRLLRRGNYAQRIGSGAPVYLTAVLEYL
AAEILELAGNAARDNKKTRIIPRHLQLAIRNDDELNKLLGNVTIAQGGVLPNIHQNLLPK
KSAKATKASQEL
>]
molsim commented 9 years ago

also this one 19115333 H2A.X Schizosaccharomyces pombe 972h- should show up. But this is in curated set - the NR would pull much more

edraizen commented 9 years ago

Well that one comes up as H2A.X already! Something is definitely wrong with the filter...

The issue is the bootstrap table is trying to display the previous page number, but since there is only one page, the page number is too large and there will not be any sequences.

edraizen commented 9 years ago

I fixed the filter so it should work and I have updated the score models to allow for the regex classification. Should I close the issue?