geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

TPV single stranded and double stranded DNA binding #12130

Closed ValWood closed 7 years ago

ValWood commented 8 years ago

are under GO:0043566 structure-specific DNA binding Interacting selectively and non-covalently with DNA of a specific structure or configuration e.g. triplex DNA binding or bent DNA binding.

So I could not locate them easily by drilling down from DNA binding It does not seem the right place for them?

(needed for curating a paper for a student workshop)

paolaroncaglia commented 8 years ago

Hi Val,

The links are:

single-stranded DNA binding is_a structure-specific DNA binding structure-specific DNA binding is_a DNA binding

double-stranded DNA binding is_a structure-specific DNA binding structure-specific DNA binding is_a DNA binding

So both single-stranded and double-stranded DNA binding are descendants of DNA binding (one step removed, so to say). I agree that both DNA binding and structure-specific DNA binding have several children, so this may not be immediate to see in the graph, but I can’t see that the current structure would be wrong. What one can’t appreciate in QuickGO is that ‘structure-specific DNA binding’ is needed to provide axioms, e.g.

id: GO:0003697 name: single-stranded DNA binding […] intersection_of: GO:0005488 ! binding intersection_of: has_input CHEBI:9160 ! single-stranded DNA

I hope this makes more sense, let me know if I can close this ticket or if you still have concerns.

Thanks, Paola

ValWood commented 8 years ago

I know I did not think they should be "structure specific" To me, structure specific means a particular conformation of DNA, not s or d-strandedness ? Have they always been here?

ValWood commented 8 years ago

2008-04-01 Updated RELATION is a GO:0043566 (structure-specific DNA binding)

Still looks weird to me...

paolaroncaglia commented 8 years ago

I think it boils down to the ‘configuration’ bit in the def of GO:0043566 structure-specific DNA binding: Interacting selectively and non-covalently with DNA of a specific structure or configuration e.g. triplex DNA binding or bent DNA binding. I guess that’s why e.g. chromatin DNA binding and DNA end binding are also under there. If that still sounds unsatisfying, I can discuss with other editors tomorrow.

Cheers, Paola

ValWood commented 8 years ago

Well I can see 'triplex' as a configuration, but everything else is either double or single stranded or double stranded, and following this logically would mean that "structure-specific" binding encompassed everything, so it is unsatisfying to me :)

Q what does NOT go under this term? (eventually)

paolaroncaglia commented 8 years ago

Hi Val,

I agree, I'll bring this up at the editors call today.

Cheers,

Paola

paolaroncaglia commented 8 years ago

Hi Val,

Here's the resolution from the editors call today:

AI: In the short term, Paola to place ‘single-stranded DNA binding’ and ‘double-stranded DNA binding’ directly under ‘DNA binding’. Then look at other descendants of ‘DNA binding’ and place them as most appropriate directly under ‘single-stranded DNA binding’, ‘double-stranded DNA binding’, ‘sequence-specific DNA binding’ where such links may be necessary and are currently missing. E.g., 'left-handed Z-DNA binding' should be directly under is_a ‘double-stranded DNA binding’ (it is currently a descendant). At the end, evaluate if ‘structure-specific DNA binding’ may be reworded based on its remaining children, and whether it is indeed a necessary term. How are people using it, is it relevant for enrichment? (I think it would be only if more informative and less broad than currently is.) As of today there are ~170 direct manual annotations, will need to examine them.

Hope this sounds good, will implement first step tomorrow,

Cheers

Paola

ValWood commented 8 years ago

That sounds sensible. I'm also not convinced the 'structure-specific DNA binding' is a necessary term. If you know that it is 'structure-specific binding' you should be able to say what that structure is....

paolaroncaglia commented 8 years ago

First part done: placed ‘single-stranded DNA binding’ and ‘double-stranded DNA binding’ directly under ‘DNA binding’

Rest to follow. Removing high-priority tag for now.

Thanks Paola

paolaroncaglia commented 7 years ago

Hi @ValWood,

Following up on my proposal here (the first bit was already done): https://github.com/geneontology/go-ontology/issues/12130#issuecomment-154170393 Looking at children of GO:0043566 ’structure-specific DNA binding’:

1) GO:0003681 bent DNA binding Interacting selectively and non-covalently with DNA in a bent conformation. No other parent; no children. The supporting paper (PMID:12627977) mentions DNA duplexes, but doesn’t seem to be annotated itself. Term has 11 experimental annotations from 5 papers (EcoliWiki, FlyBase, MGI, UniProt). They’d need looking at to verify if the binding is to double-stranded DNA in all these cases. However, my understanding is that bending can only occur in double-stranded DNA, see https://en.wikipedia.org/wiki/Nucleic_acid_double_helix#Bending. Would you agree? If yes, I’d make ‘bent DNA binding’ is_a ‘double-stranded DNA binding’ (but wouldn’t remove is_a ’structure-specific DNA binding’ for now).

2) GO:0031490 chromatin DNA binding Interacting selectively and non-covalently with DNA that is assembled into chromatin. Other parent: is_a ‘chromatin binding’. If the intent of having this term is to specify that a gene product binds to DNA but not to other components of chromatin, then it should be is_a ‘DNA binding’, not ‘chromatin binding’ as chromatin includes RNA and/or proteins as well as DNA… As for the link to ’structure-specific DNA binding’, I don’t think it adds a lot of information in this case. Chromatin is not a ‘structure’. 195 experimental annotations to the term and to its children.

3) GO:0045027 DNA end binding Interacting selectively and non-covalently with the ends of DNA that are exposed by the creation of double-strand breaks (DSBs). In this case it’s not possible to place under ‘single-stranded’ or ‘double-stranded’ terms. If we ever get rid of ’structure-specific DNA binding’, ‘DNA end binding’ should sit directly under ‘DNA binding’. (No other parent; no children. 4 experimental annotations.)

4) GO:0000217 DNA secondary structure binding Interacting selectively and non-covalently with DNA containing secondary structure elements such as four-way junctions, bubbles, loops, Y-form DNA, or double-strand/single-strand junctions. In this case it’s not possible to place under ‘single-stranded’ or ‘double-stranded’ terms, though it might be possible for some of the term’s children. If we ever get rid of ’structure-specific DNA binding’, ‘DNA secondary structure binding’ should sit directly under ‘DNA binding’. (No other parent; 63 experimental annotations to term + children.)

5) GO:0070336 flap-structured DNA binding Interacting selectively and non-covalently with a flap structure in DNA. A DNA flap structure is one in which a single-stranded length of DNA or RNA protrudes from a double-stranded DNA molecule. My understanding is that the flap structure covers both single- and double-stranded DNA, so it’s not possible to place the term (or its children) under ‘single-stranded’ or ‘double-stranded’. If we ever get rid of ’structure-specific DNA binding’, ‘flap-structured DNA binding’ should sit directly under ‘DNA binding’. (No other parent; 7 experimental annotations to term + children.)

6) GO:0051880 G-quadruplex DNA binding Interacting selectively and non-covalently with G-quadruplex DNA structures, in which groups of four guanines adopt a flat, cyclic Hoogsteen hydrogen-bonding arrangement known as a guanine tetrad. The stacking of guanine tetrads results in G-quadruplex DNA structures. G-quadruplex DNA can form under physiological conditions from some G-rich sequences, such as those found in telomeres, immunoglobulin switch regions, gene promoters, fragile X repeats, and the dimerization domain in the human immunodeficiency virus (HIV) genome. In this case it’s not possible to place under ‘single-stranded’ or ‘double-stranded’ terms. If we ever get rid of ’structure-specific DNA binding’, ‘G-quadruplex DNA binding’ should sit directly under ‘DNA binding’. (No other parent; no children; 15 experimental annotations.)

7) GO:0003692 left-handed Z-DNA binding Interacting selectively and non-covalently with DNA in the Z form, i.e. a left-handed helix in which the phosphate backbone zigzags. No other parent; no children. I made it is_a ‘double-stranded DNA binding’ (but didn’t remove is_a ’structure-specific DNA binding’ for now).

8) GO:0003695 random coil DNA binding Interacting selectively and non-covalently with DNA in a random coil configuration. No other parent; no children. I’d guess that this is always double-stranded DNA. At any rate the term has no annotations or mappings, and I think it could be safely merged into its parent or into ‘DNA binding’. (NBO uses it.)

9) GO:0097100 supercoiled DNA binding Interacting selectively and non-covalently with supercoiled DNA. For example, during replication and transcription, template DNA is negatively supercoiled in the receding downstream DNA and positively supercoiled in the approaching downstream DNA. No other parent; no children. Supercoiled DNA is always double-stranded, so I made it is_a ‘double-stranded DNA binding’ (but didn’t remove is_a ’structure-specific DNA binding’ for now). (9 experimental annotations.)

10) GO:0045142 triplex DNA binding Interacting selectively and non-covalently with a DNA triple helix. The formation of triple helical DNA has been evoked in several cellular processes including transcription, replication, and recombination. In this case it’s not possible to place under ‘single-stranded’ or ‘double-stranded’ terms. If we ever get rid of ’structure-specific DNA binding’, ‘triplex DNA binding’ should sit directly under ‘DNA binding’. (No other parent; no children; 3 experimental annotations.)

So, the terms that at the moment have ’structure-specific DNA binding’ as the only is_a parent, and that would end up under is_a ‘DNA binding’ if ’structure-specific DNA binding’ were removed/merged, are:

bent DNA binding?? chromatin DNA binding?? DNA end binding DNA secondary structure binding flap-structured DNA binding G-quadruplex DNA binding random coil DNA binding?? triplex DNA binding

(See above for the question marks)

My feeling is that yes if a GP is binding to DNA in a specific structure/conformation you would know and could therefore in theory use one of the children terms, and then ‘structure-specific DNA binding’ could be disposed of. It currently has 16 direct experimental annotations, all IDA (EcoCyc, RGD, SGD, UniProt). Given the low numbers and the fact that the term is not incorrect per se, I’d opt for merging (vs. obsoleting), but I’d still inform the databases in case they wanted to rehouse their annotations.

@ValWood, please let me know if you have any feedback on those question marks, and if you’d agree with the general strategy.

ValWood commented 7 years ago

I think merging the "structure-specific DNA binding" term would help in the location of terms and make for more consistent/accurate annotation, especially as the term itself is so infrequently used.

It is not clear to me exactly what "DNA secondary structure binding" means either. It is defined by the list of terms which are housed under it, but how do we define DNA secondary structure. I'm trying to think analogously to protein secondary structure (hydrogen bonds etc), but there does not seem to be a correlation, but a 4 way junction sounds more analogous to a primary structure?

Other than that which is probably due to my lack of knowledge it sounds sensible to me. @mah11 will be more knowledgable on this. Does it sound OK?

mah11 commented 7 years ago

On "structure-specific DNA binding", I don't have strong feelings either way, so no objection to Paola's proposal.

The "DNA secondary structure binding" has never confused me (its definition sounds straightforward), so I'm not sure I can help Val understand it. The structures it groups, such as four-way junctions, Ys, bubbles, etc. do sound roughly analogous to protein secondary structures; primary structure is essentially the sequence for both proteins and nucleic acids.

ValWood commented 7 years ago

So G-quadruplex's flaps and coils would also be secondary structures in this case? Everything which is not primary or tertiary?

mah11 commented 7 years ago

I don't understand this q - as far as I know, G-quadruplexes don't have flaps and coils.

ValWood commented 7 years ago

GO:0051880 G-quadruplex DNA binding GO:0003695 random coil DNA binding GO:0070336 flap-structured DNA binding

paolaroncaglia commented 7 years ago

Thanks @ValWood and @mah11 for your feedback. @ValWood, do I interpret your last 2 comments correctly as “should we place GO:0051880 G-quadruplex DNA binding GO:0003695 random coil DNA binding GO:0070336 flap-structured DNA binding under (is_a) ‘DNA secondary structure binding’?” Or did you mean something else? If we’re not sure, I’d just place them under ‘DNA binding’, and if/when people have strong reasons to move them further down, that can be addressed. Actually, I think I’ll go that route anyway - the main focus here at this point is to get rid of ‘structure-specific DNA binding’ and rehouse terms as well as we can within reasonable time and make sure we don’t incur in TVPs. I’ll do that.

paolaroncaglia commented 7 years ago

Now done:

Moved ‘bent DNA binding’ from is_a ‘structure-specific DNA binding’ to is_a ’double-stranded DNA binding’; Moved ’chromatin DNA binding’ from is_a ‘structure-specific DNA binding’ to is_a ’DNA binding’ (on second thought, I’ll leave the other parent is_a ‘chromatin binding’ as is, if a GP binds DNA that is part of a chromatin structure it is also binding to the whole chromatin thing); Moved ’DNA end binding’ from is_a ‘structure-specific DNA binding’ to is_a ’DNA binding’; Moved ‘DNA secondary structure binding’ from is_a ‘structure-specific DNA binding’ to is_a ’DNA binding’; Moved ‘flap-structured DNA binding’ from is_a ‘structure-specific DNA binding’ to is_a ’DNA binding’; Moved ‘G-quadruplex DNA binding’ from is_a ‘structure-specific DNA binding’ to is_a ’DNA binding’; Deleted link ’left-handed Z-DNA binding’ is_a ‘structure-specific DNA binding’ (I made it ‘double-stranded DNA binding’ last week); Moved ‘random coil DNA binding’ from is_a ‘structure-specific DNA binding’ to is_a ’DNA binding’; Deleted link ’supercoiled DNA binding’ is_a ‘structure-specific DNA binding’ (I made it ‘double-stranded DNA binding’ last week); Moved ‘triplex DNA binding’ from is_a ‘structure-specific DNA binding’ to is_a ’DNA binding’.

For me to do next:

@ValWood: if you’d like any child of ‘DNA binding’ to be housed under a more specific parent, please open a new ticket. Again, there should’t be an immediate need to do that as the structure I’ll implement is not incorrect, is already as specific as it can be given this ‘first’ pass, and we’ve addressed your point of getting rid of the confusing and not very informative ‘structure-specific DNA binding’. Thanks. :-)

paolaroncaglia commented 7 years ago

gene_association_GO_0043566.xlsx

paolaroncaglia commented 7 years ago

Hi @slaulederkind, This is just to inform you that, following discussion above, we are going to merge GO:0043566 ‘structure-specific DNA binding’ into its parent GO:0003677 ‘DNA binding’. This will affect 1 RGD annotation to ‘structure-specific DNA binding’; it will be rehoused under ‘DNA binding’. If you agree to this change, you do not need to do anything. If, instead, you’d prefer to review your annotation and e.g. rehouse it under a more specific descendant of ‘DNA binding’, please find the affected RGD annotation in the attached file. Thanks!

paolaroncaglia commented 7 years ago

Hi @srengel, This is just to inform you that, following discussion above, we are going to merge GO:0043566 ‘structure-specific DNA binding’ into its parent GO:0003677 ‘DNA binding’. This will affect 3 SGD annotations to ‘structure-specific DNA binding’; they will be rehoused under ‘DNA binding’. If you agree to this change, you do not need to do anything. If, instead, you’d prefer to review your annotations and e.g. rehouse them under a more specific descendant of ‘DNA binding’, please find the affected SGD annotations in the attached file. Thanks!

paolaroncaglia commented 7 years ago

I emailed EcoCyc and UniProt.

paolaroncaglia commented 7 years ago

Merged ‘structure-specific DNA binding’ into its parent ‘DNA binding’. Closing now.

srengel commented 7 years ago

thanks Paola. i'm going to just leave these 3 alone and let them be rehoused under 'DNA binding'.

paolaroncaglia commented 7 years ago

Thanks @srengel .

paolaroncaglia commented 7 years ago

Sylvain at UniProt also agrees with the merge.