PROconsortium / PRoteinOntology

Other
14 stars 3 forks source link

Term issue: Dicty modified histones #293

Closed nataled closed 2 years ago

nataled commented 2 years ago

Dear @karenross,

I know it’s late now but for next week or so, here is my full table for mostly developmentally regulated modifications. I hope it’s all more or less ok what I did. Please let me know if you have any questions.

Note that especially both H3 are very conserved at N-terminal and therefore likely that initiator M is clipped off. The others are also conserved not as exact as 5’end but researchers do think it’s like mainly all other organisms that have shown it.

Also, Dicty has many acetylates and the exact matches for each one has not been investigated as far as I know, but I will check further and let you know if I find anything though I doubt it.

Many thanks for all your help and have a nice weekend!

@pfey03 PRO Dicty Table annotations.xlsx

nataled commented 2 years ago

Hi Petra, Karen, I created this issue to facilitate communication regarding the requested terms. I have already two questions for @pfey03:

1) For the term on line 10 of the table (K37 trimethylation of H3.3b), is the evidence for the methylase (set2) given in PMID:22600736, PMID:16469305, or some other? If unknown that's okay.

2) Line 5, H3.3b K5-trimethylated; acetylated - Is the acetylation meant to be on a specific residue, or is it meant to be "any acetylation"? If the former, is the residue K5?

pfey03 commented 2 years ago

@nataled For line line 10 (and also 11 H3a) PMID:22600736 is the correct reference described in main text and Figure S2 is the experiment that set2 (Q55FF7) is the methyltransferase for what they call K36.

Line 5 Im sorry I forgot to specify the acetylation in Position/Modification column. The acetylation shown by AU Gel is like in higher organisms acetylated targeted towards already modified K4me3 to become K4me3A. It is acetylated on the methylated tail of K4me3.

Line 6 is the only line two references as there the modifying enzyme is by similarity as it is the only Dicty ortholog for su (var), called in Dicty suvA (Q55DR9 not well annotated in UniProt - I might request annotation).

Does that clear it up? Many thanks, Petra

pfey03 commented 2 years ago

@nataled The references in line 2-4 0or 5 are all correct.

However, I messed up line 6 in many places. Here is the complete line 6 correct, tabs separated by pipes:

Q55BN9 | Histone H3.3 type b | K10 methylation | Methylation | 22600736, 16469305 | experimental | In vitro | Q55DR9 (this inferred by similiarity in PMID: 22600736, 16469305 and is likely, as single gene in Dd) | A histone H3.3 type b (Dictyostelium discoideum) that has been trimethylated at the position equivalent to Lys-10 of the amino acid sequence represented by UniProtKB:Q55BN9. Note that the indicated residue is called Lys-9 in the literature, based on sequence numbering that accounts for a removed initiator methionine. UniProtKB:Q55BN9, Lys-10, MOD:00085. [PMID:22600736 dictyBase:PF]

I can also send the updated table if you wish.

nataled commented 2 years ago

@pfey03 No need for updated table. I changed it in my local copy. Regarding the 'UniProt name' for lines 3-5, these should lack any mention of PTM state. I also note that they indicated PTMs differ in that column from those given in the modification columns. I presume the latter are correct?

pfey03 commented 2 years ago

@nataled OK I thought I add the kind of pre-state but you can delete the modifications in column 2 if that's incorrect. And it's true, the parent of all modifications in that case is the H3b Histone, my mistake, I deleted it in my table too. I hope now it's all clear, sorry, but the table was difficult with all the papers and modifications checking, one by was is easier.

nataled commented 2 years ago

@pfey03 I see. FYI we usually do not attempt to incorporate the intermediate steps (within PRO) for how a given proteoform arises. That's because a single form can arise via multiple paths (for example, a proteoform with two phosphorylations could have come from kinase activity on a form with a single phosphorylation, or from phosphatase activity on a form that had three phosphorylated sites).

Sorry that the table format has been less usable than one by one. We'll be working on a simplified format for tables in the future, but always feel free to use whichever method you find most convenient.

pfey03 commented 2 years ago

@nataled Thanks Darren. In the future, depending on publcations in the future, I won't have so many and then one by one will be ok. I used the first two from my H3a PRO Ids for extensions in Protein2GO but they didn't recognize them yet, I think it might take a while when they update their files. But it was accepted and eventually they will probably know what they are. For Noctua we are still discussing how to best accept PRO IDs.

nataled commented 2 years ago

Hi @pfey03, For line 19, in the column for 'Type of modification' I see 'Acetylation-2, Methylation'. Does 'Acetylation-2' mean something in particular, like (guessing here) diacetylation on a single residue (does that happen?) or perhaps it just refers to the two acetylation sites (K14, K22) indicated?

pfey03 commented 2 years ago

Hi @nataled In line 19 I have it very clear in Positions/Modifications: K14 acetylation / K18 monomethylation / K22 acetylation. I then just shortened it, I just wanted to indicate 2 acetylations and 1 methylation. In the note: "A histone H4 (Dictyostelium discoideum) that has been acetylated at the position equivalent to Lys-14, monomethylated at Lys-18, and acetylated at Lys-22 of the amino acid sequence represented by UniProtKB:Q76NW2. Note that the indicated residues are called Lys-13, Lys-17, and Lys-21 ..." Just at the very end of note I forgot to change it (Lys-10 is still there) after copying, and do not know if all 3 modifications should be added there instead. Edit the note as that as you see fit. Thanks!

nataled commented 2 years ago

@pfey03 Thanks for the clarification. I don't ever want to make assumptions, so when I see something with potential internal inconsistency or that could have multiple interpretations--no matter how unlikely one might be--I have to check.

Terms will be finished soon.

nataled commented 2 years ago

Terms are finished and usable now. For review purposes, I placed all requested terms in a file (attached). I also added a few columns to the table to show the PR identifiers (also attached).

@karenross please check the new terms for accuracy and consistency. I did so already, but I want to make sure I didn't miss anything.

FYI, those 'Contains...' statements use what's called the Brno nomenclature for histones. These are described in https://www.nature.com/articles/nsmb0205-110.pdf

PRO Dicty Table annotations - DN.xlsx PRO dictyBase histones.obo.txt

pfey03 commented 2 years ago

@nataled Many thanks for the table and txt files, really fast thank you!

The paper paper is very high organism centric. In Dicty H3a and H3b are different and not always have the same modifications (experimentally shown). And we have H2Ax and H2Bv3 as H2, and several H2 domain containing proteins (the latter so far no role in the publications). H4a and H4b have the same protein sequence and only one UniProt record, which is fine. Experimentally they have also not been separated, it's not easy. We have one H1 that has been published a longer time ago and gets phosphorylated in early development.

karenross commented 2 years ago
  1. We could add Note: In vitro to the comment line for all terms.
  2. Should we be concerned about making the numbering of the modified forms consistent with the human orthologs? For example, for human histone H3.3A (not actually sure if this is the ortholog for Dicty histone H3.3 type a), monomethylated 1 is the Lys-37 methylation and monomethylated 2 is the Lys-5 methylation, and we have the Lys-5 as monomethylated 1 in Dicty.
nataled commented 2 years ago

@karenross yes, I'll add the note. As for name normalization, it's something I'd like done--someday--but there are so many possible issues that it would be better to do it globally after careful consideration.

Other than your two suggestions, were all terms error-free?

karenross commented 2 years ago

Yes, otherwise all terms look good.

pfey03 commented 2 years ago

It is difficult with in vivo and in vitro with Dicty. they often use mutants and also have phenotypes but then the acetic AU gel where you see the ladder of modifications with specific antibodies I decided to say 'in vitro')

pfey03 commented 2 years ago

@nataled Hi Darren, PR:000060206 is wrong. It's not H2AX but H4 in my file in line 19. It is H4K13acK17me1K21ac not H2AX. This H4 modification is very much involved in development and not much available when cells are dividing and with nutrition (vegetative growth as it's called in Dicty field)

Def: A histone H4 (Dictyostelium discoideum) that has been acetylated at the position equivalent to Lys-14, monomethylated at Lys-18, and acetylated at Lys-22 of the amino acid sequence represented by UniProtKB:Q76NW2. Note that the indicated residues are called Lys-13, Lys-17, and Lys-21 respectively in the literature, based on sequence numbering that accounts for a removed initiator methionine. UniProtKB:Q76NW2, Lys-14,Lys-18,Lys-22 MOD:00085. [PMID:33947439 dictyBase:PF]

nataled commented 2 years ago

Ugh, sorry, fixed. I did them out of order and in my mind I was dealing only with H2AX for the last few. The fixed version is live now.

pfey03 commented 2 years ago

Thanks so much!

karenross commented 2 years ago

Annotating as in vitro: Usually we reserve in vitro for assays done entirely in vitro with recombinant/purified proteins (e.g., a kinase assay done by mixing purified kinase and substrate in vitro). In other words, for cases where the modification is taking place in vitro. Petra, from your description it sounds like even in the case of the Ab/ladder of bands assay, the modifications took place in vivo. If so, we could annotate as in vivo. If you think there is some ambiguity, it is fine to leave that annotation out entirely.

pfey03 commented 2 years ago

@karenross Yes in Dicty, all is more direct because they are eukaryotic single cells when growing and can be with food brought back even if developing and or cells can be separated during development. So it is much more closely to the organismal processes than any higher organism cell culture. That's why I was conflicted. Maybe change it all to 'in vivo' and it will be fine as it is very direct. They e.g. mutate the K4 to alanine and no methylation and developmental defects and all these things are very much like it is in vivo.

karenross commented 2 years ago

@nataled Please change the evidence type in the comment to in vivo for all terms

nataled commented 2 years ago

Will do. I'll keep this issue open until next week, in case anything comes up.

pfey03 commented 2 years ago

@nataled Is it easier if I change it in race PRO and you just have to approve? will do nothing else, just change in vitro to in vivo

nataled commented 2 years ago

@pfey03 no need; I've already made the change in the master file. But because it is such a minor change and coming at a point where the release process has started, I did not yet upload that file.

That being said, in general, github works well for such change requests.

pfey03 commented 2 years ago

@nataled in PR:000060204 in the comment line there is H2AXK9me1 the other parts are ok. The whole in Dicty Literature is H2AXS1acS8phK9me2

This entry is dimethylated so shouldn't the above be H2AXK9me2?

Thanks!

pfey03 commented 2 years ago

I also have to add very few one by one that didn't make it into list, but have to see which are important

pfey03 commented 2 years ago

@nataled PR:000060206 also has a mistake in the comment, it is two acetylations and one methylation: H4K13acK17me1K21ac

The Def is correct: A histone H4 (Dictyostelium discoideum) that has been acetylated at the positions equivalent to Lys-14 and Lys-22, and monomethylated at the position equivalent to Lys-18 of the amino acid sequence represented by UniProtKB:Q54WG6. Note that the indicated residues are called Lys-13, Lys-21, and Lys-17, respectively, in the literature, based on sequence numbering that accounts for a removed initiator methionine

Corrected Comment: Category=organism-modification. Requested by=dictyBase. Note: In vivo. Contains H4K13ac, H4K21ac, and H4K17me1.

Just fix the comment and it will be fine, thanks!

nataled commented 2 years ago

Thanks Petra! I fixed them internally and will upload when I'm at a good place to do so (lots of editing happening now). Regarding things like 'H2AXS1acS8phK9me2' I originally did add these as synonyms, but then I came to realize that the nomenclature is not for the whole protein but rather for specific amino acid modifications. That's why I changed to indicating them in the comments; I thought it would still be useful to find the modifications of interest to a user.

pfey03 commented 2 years ago

Thanks Darren, and Happy Thanksgiving!