geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
32 stars 10 forks source link

Guidelines for annotating papers containing High ThroughPut (HTP) experiments #1469

Closed vanaukenk closed 3 years ago

vanaukenk commented 7 years ago

This working group will develop guidelines for curators on how to annotate results from high throughput experiments.

srengel commented 7 years ago

i'll participate.

pgaudet commented 7 years ago

I'd also like to participate.

Pascale

Le 13 déc. 2016 11:45 AM, "Stacia Engel" notifications@github.com a écrit :

i'll participate.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-annotation/issues/1469#issuecomment-266791692, or mute the thread https://github.com/notifications/unsubscribe-auth/AEj7UNtDxzMIAOm5TmjNUTPasPns9lUDks5rHsu1gaJpZM4K27fK .

RLovering commented 7 years ago

I'll participate too

hattrill commented 7 years ago

I'll participate as well.

vanaukenk commented 7 years ago

I would like to participate in this group, too. I've added all of the participants to the assignees list.

ValWood commented 7 years ago

Hi,

In this working group could you address whether we should make assignments to metabolic processes from metabolomics studies. To me this seems similar to making annotations/inferences based on changes in a microarray expression. Hundreds of metabolites could fluctuate directly and indirectly in a given mutant....

Here is an example using changes in metabolite level in urine http://amigo.geneontology.org/amigo/reference/PMID:18648510

v

ValWood commented 7 years ago

This is another paper that uses metabolic finger printing. This one isn't HTP, but isn't this a bit like inferring metabolism from gene expression changes?http://amigo.geneontology.org/amigo/reference/PMID:17237885 Seems dodgy, to annotate metabolism from a metabolic fingerprint exp with no follow up?

sylvainpoux commented 7 years ago

Hi,

Paola forwarded me the link. I would be interested in participating to this discussion

Thanks

Sylvain

mchibucos commented 7 years ago

Count me in!

hattrill commented 7 years ago

Next VC at 1600 BST 16 May. Using Bluejeans.

Relevant docs: Working doc on proposal - points to discuss within. Please add to, comment and edit before, during and after VC. https://docs.google.com/document/d/1ScIeclAzUXMe-tU6n0lVfsSwMHpOeNb7uK8On9-iKXc/edit

Meeting notes: https://docs.google.com/document/d/1Gd6cRQh67QZQ2_T7kQjqYMpkfJ_Wqm0baVpyMdFPRHs/edit

Spreadsheet of HTP papers: https://docs.google.com/spreadsheets/d/11xExGJfj_39xPQUGkam3Xvtd6dtZ5DfANXhM2ZtDYB0/edit?ts=58d39700#gid=0

ValWood commented 7 years ago

Hi Helen, I'm in Banff.... Will join next call. Val

On 12/05/2017 14:00, Helen Attrill wrote:

Next VC at 1600 GMT 16 May. Using Bluejeans.

Relevant docs: Working doc on proposal - points to discuss within. Please add to, comment and edit before, during and after VC. https://docs.google.com/document/d/1ScIeclAzUXMe-tU6n0lVfsSwMHpOeNb7uK8On9-iKXc/edit

Meeting notes: https://docs.google.com/document/d/1Gd6cRQh67QZQ2_T7kQjqYMpkfJ_Wqm0baVpyMdFPRHs/edit

Spreadsheet of HTP papers: https://docs.google.com/spreadsheets/d/11xExGJfj_39xPQUGkam3Xvtd6dtZ5DfANXhM2ZtDYB0/edit?ts=58d39700#gid=0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-annotation/issues/1469#issuecomment-301069924, or mute the thread https://github.com/notifications/unsubscribe-auth/AHBLKM-CmGb0xqO5Q0UeMbZYMBdG_LAcks5r5FfkgaJpZM4K27fK.

-- University of Cambridge PomBase http://www.pombase.org/ Cambridge Systems Biology Centre http://www.sysbiol.cam.ac.uk/Investigators/val-wood

hattrill commented 7 years ago

Hi all,

With respect to using different flavour of evidence code for the annotation of HTP data:

I asked Chris about limitations on evidence codes in GAF and he said "I don't care about number of letters!"

So, I guess we'll just have to use our intuition and common sense to come up with suitable codes!

I had a poke at the list of HTP papers that Sylvain and Pascale added to the spreadsheet. I did a rather rough and ready grab of all annotations (with no attempt to distinguish those that were multiple annotations to the same term and those that were low throughput). Chucked out the non-experimental codes and counted the rest.

Code No. % IMP 9269 66 IEP 2296 16 IDA 1623 12 IGI 728 5 IPI 93 1 *

Rather surprisingly IEP came out rather well represented. Examples using this evidence code include "response to" terms - for this Arabidopsis paper https://www.ebi.ac.uk/QuickGO/GAnnotation?ref=16463103 they are looking at expression patterns when stimulated with a hormone. Immune response genes are also a popular theme (a worm paper:https://www.ebi.ac.uk/QuickGO/GAnnotation?ref=16968778).

[IPI 93 - 1% (actually, this is pretty much zero - it is a false positive: D.mel Dscam has >150 uniprot mappings (as it has a lot of unique isoforms). It has two self-association annotations that send it off the scale). There are a handful of other contributors with are all LTP. So really, this number is 0.] (Perhaps IPI is under represented as HTP protein interaction studies are often based on one experiment, and we usually look for more when we use this for binding.)

Any way, on our list of HTP varieties, we had: mutant phenotype, direct assay, genetic interaction and physical interact. (Assuming everyone is broady ok with adding more granular HTP codes):

Q: Should we add expression pattern?

vanaukenk commented 7 years ago

Looks like should probably also have an expression pattern HTP evidence code, although as with the others, we'll want to clarify its use in the guidelines.

srengel commented 7 years ago

i agree with Helen, add a code for HTP expression, and with KVA re clarifying its use in the guidelines.

RLovering commented 7 years ago

Sounds like a good decision to me too Ruth

Sent from my iPhone

On 25 May 2017, at 17:36, Stacia Engel notifications@github.com<mailto:notifications@github.com> wrote:

i agree with Helen, add a code for HTP expression, and with KVA re clarifying its use in the guidelines.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/geneontology/go-annotation/issues/1469#issuecomment-304057328, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALhTiiI6mbsBqglOwDW2i03kJr1aC8Wvks5r9a3wgaJpZM4K27fK.

hattrill commented 7 years ago

Ok - so below is a list of new evidences that we might want - not sure about the need for IPI because of the apparent lack of HTP annotations. I assume that this is because we are all being good and not annotating HTP IPIs (and using other ways to capture these). Does anyone have an opinion on this? Pascale had a good suggestion of aligning with InActs guidelines for HTP IPIs. We should probably discuss this in more detail.

HTP - inferred from high throughput experiment (equivalent to EXP for LTP) -inferred from HTP mutant phenotype (equivalent to IMP for LTP) -inferred from HTP genetic interaction (equivalent to IGI for LTP) -inferred from HTP direct assay (equivalent to IDA for LTP) -inferred from HTP expression pattern (equivalent to IEP for LTP) -inferred from HTP physical interaction (equivalent to IPI for LTP) (?)

(LTP = low throughput)

As for the actual format of the codes themselves - do you think we could come up with some ideas....HTP (yes) .....more granular .....HTP-MP, HTPMP ?

srengel commented 7 years ago

what about just using the H instead of the I? HMP HGI HDA HEP HPI

vanaukenk commented 7 years ago

I like Stacia's suggestion. It keeps things simple and clear, but is also generally consistent with our other evidence codes.

hattrill commented 7 years ago

I like them too! Let's put them to the GOC.

hattrill commented 7 years ago

Another Q: For the full length evidence code, the pattern of the wording should be:

inferred from mutant phenotype in high throughput experiment or inferred from mutant phenotype from high throughput experiment or inferred from high throughput mutant phenotype experiment or something else?

pgaudet commented 7 years ago

Hi @hattrill I like your suggestions for the full length terms.

Pascale

vanaukenk commented 7 years ago

Hi @hattrill I vote for number two: inferred from mutant phenotype from high throughput experiment Thx.

srengel commented 7 years ago

i vote for a 4th option: Inferred from High-Throughput Mutant Phenotype (HMP) Inferred from High-Throughput Direct Assay (HDA) Inferred from High-Throughput Physical Interaction (HPI) Inferred from High-Throughput Genetic Interaction (HGI) Inferred from High-Throughput Expression Pattern (HEP)

these are the closest to the current term strings, with only the insertion of 'high-throughput'

could we also have an HSM? Inferred from High-Throughput Sequence Model. one of the papers curated at SGD as HTP GO is all ISM. it's one of the examples i added to Ruth's spreadsheet.

vanaukenk commented 7 years ago

Yes, @srengel - those names would be good - even simpler!

hattrill commented 7 years ago

Hi @mchibucos Do you have any thoughts on mapping to ECO?

mchibucos commented 7 years ago

@hattrill I like the names suggested by @srengel. I think that we would create a subclass for each GO HT evidence code under the recently created 'high throughput evidence' ECO:0006055.

Then we would logically define ECO class using the non-HT class, for example something like "HMP evidence has_part IMP evidence". "Has part" is probably not the right relation, but my point is that we could potentially structure it like that. Perhaps @cmungall can suggest a correct RO relation, and comment on whether my proposed structure seems reasonable?

Also there's the consideration as to whether the idea that "HMP evidence has_part IMP evidence" is correct. Does that make sense?

ValWood commented 6 years ago

will the new evidence codes be added here: http://geneontology.org/page/guide-go-evidence-codes we link to these for an explanation from the PomBase gene pages.

hattrill commented 6 years ago

I think that would be the sensible place. I'll take a look at the feed back for the guidelines and get them up on the wiki in the next couple o' weeks.

RLovering commented 6 years ago

Hi

I am still not happy that there is no option for ISS/PAINT/Ensembl transfer of information to orthologs. I don't mind if there is a new code but without transfer of this information to orthologs it does make MOD HTP data unusable for human analyses.

Is there any chance that we can revisit this decision?

Thanks

Ruth @rachhuntley @NancyCampbell @BarbaraCzub

hattrill commented 6 years ago

Ok. Here's my suggestion - let's take a look at it in the new year. Go through some examples and see what conclusion(s) we come to. How about a meeting mid-Feb? By then everyone should have done their HTP re-assignments. If we can't come to a decision, then we need to put the the pros and cons to the consortium folks to get a consensus.

ValWood commented 6 years ago

Is it good to transfer HTP experiments between organisms?

This could occur in 2 ways. 1 via a pipeline would be IEA (but probably these evidence codes should not be transferred via an automated pipeline at all).

They could occur on a case by case basis. In this case the curator will have assessed all of the available information. I doubt that a curator would often use a HTP evidence for transfer, but sometimes might, if it is very clearly correct. I have used, for example to get a "mitochondrial" annotation onto a protein which is known to be a mitochondrial function, when the only evidence is a HTP at SGD). In this case if I am convinced enough that I can make the annotation, ISS seems to be an appropriate evidence code to use.

For PAINT these would still be IBA? which also seems fine.

So, what type of transfer would additional evidence codes apply to?

pgaudet commented 6 years ago

Hello,

I thought we said PAINT could transfer HTP annotations, since the annotations are reviewed. We could make the same arguments for all manual codes (ie, ISS). IEAs should definitely not be allowed.

Does that seem OK ?

Thanks, Pascale

hattrill commented 6 years ago

I agree, no for IEAs.

I think we said IBA if they were supported by one other exp annotation.

Case-by-case ISS seems, on the face of it, fine - if these are indeed case-by-case.

I think that Ruth referred to an example of larger transfer, pig to human for a heart set (?) at the GOC meeting.

There are examples, like the pipeline between mouse and rat, where an ISS/ISO code is used in an automated pipeline.

RLovering commented 6 years ago

Thanks for all these comments.

As far as I am aware ISS/ISO are never used in an automated pipeline. All of the annotations I ISS'd from pig to human I checked manually that the protein was the equivalent ortholog. However, I will check for each that the location of the protein in the extracellular matrix is relevant for the known role of the protein (if there is one). These annotations also specify the tissue in the AE field so each one is likely to be adding new information for the human protein.

Before I do change the evidence code to HTP, is this an agreed decision that the whole GOC is happy with? Because Tony has already set up Protein2GO to prevent annotations with the HTP evidence code from being ISS'd to orthologs etc. I do not want to request this option is removed and then have to request it is reinstated.

I do feel that in many ways IBA annotations and ISS annotations are equivalent and it does seem odd to allow one without allowing the other.

Please could we also confirm whether IBA annotations from HTP data are created only for orthologs or for paralogs too? I think for cellular component and MF annotations possibly paralogs would be ok, but I think only high level BPs can go to paralogs.

Obviously we will need specific guidelines for IBA/ISS etc annotations to be added to the guideline doc

Thanks

Ruth

@tonysawfordebi

ValWood commented 6 years ago

Because Tony has already set up Protein2GO to prevent annotations with the HTP evidence code from being ISS'd to orthologs etc

I don't fully understand this? How can ISS from HTP be prevented? that's an individual curator call, and we say above that ISS is not generated by a pipeline.

How would @tonysawfordebi know if an ISS has been made from a HTP? They are supported by GO_REF:0000024 I'm more confused ;)

val

hattrill commented 6 years ago

We need to get agreement from the GOC.

Let's discuss this on a call mid-Feb.

When we talked about it in the HTP WG, we decided that the point of using HTP codes was that it made the provenance clear. As ISS transfer looses this info, we decided not to allow it.

Those infavour of ISS transfer of HTP, come up with some examples and rules.

Val: in answer to your Q, Protein2GO has QC rules that stops the annotation being made. It also reports when the source annotation is removed.

tonysawfordebi commented 6 years ago

The original decision was that ISS transfers should not be allowed from annotations with a high-throughput evidence code, so that's what I've implemented in P2G - if a curator tries to use an HTP-evidenced annotation as the source of an ISS transfer, it's not allowed.

If the decision is made that ISS from HTP is a valid thing to do, then I can easily remove the restriction from P2G. I don't make the rules; I just implements 'em.

ValWood commented 6 years ago

but if ISS GO_REF:0000024, how do you know the source of the ISS? You only have ID of the object of annotation transfer in the "with" field ....do you check the existing annotations and only allow if there is a non-HTP evidence code?

tonysawfordebi commented 6 years ago

We're talking about P2G, where, in order to do an ISS transfer of an annotation to another gene product, you first have to select the annotation that you want to transfer. If that source annotation has an HTP evidence code, then P2G will not allow the transfer.

ValWood commented 6 years ago

ah ok i get it...

RLovering commented 6 years ago

Please could you tell me the reason why IBA annotations can be created using HTP data but not ISS

Thanks

Ruth

hattrill commented 6 years ago

They would also be supported by another experimental annotation - is that correct @pgaudet? i.e. they would count towards the evidence for transfer, but would not be the sole support for.

RLovering commented 6 years ago

and why can't that rule be applied for ISS?

hattrill commented 6 years ago

Then you could just make the transfer from the EXP via ISS. I think an IBA needs support from more than 1 annotation.

BarbaraCzub commented 6 years ago

In line with the options suggested above, perhaps there could be a code called 'HBA' for high throughput IBA, and an 'HSS' for high throughput ISS? This way it would be clear where the data came from originally, and how it's been propagated.

ValWood commented 6 years ago

But why would we be propagating annotation from IBA? the PAINT pipeline should already have take care of the appropriate annotations?

...and I still believe that any high throughput ISS should be IEA, but I was overruled on that many years ago, and now I just spend a chunk of my life reporting problems derived from them...but hey ho...

BarbaraCzub commented 6 years ago

What I meant by HBA would not be propagating from IBA, but to IBA from high throughput IDA/IMP (so from HDA, HMP, based on the pattern suggested above). So the code HBA would mean that, what would otherwise have been IBA, was generated based on an original HTP EXP annotation.

Similarly, HSS would mean that, what would otherwise have been an ISS, was generated based on original HTP EXP annotation.

ValWood commented 6 years ago

But IBA is not made from a single annotation?

krchristie commented 6 years ago

IBA can be from a single annotation. It depends on what you know. When I annotated numerous families of proteins found in the 90S preribosome and/or SSU processome, these proteins are well characterized in yeast, but usually not in anything else. However, there is also another lovely paper showing the high level of conservation of SSUP proteins across a huge number of taxonomic groups. I considered a single yeast annotation combined with this evolution paper, and also sometimes personal knowledge that the yeast term might be too specific with respect to the type of rRNA transcript that is processed. This combination of knowledge allowed me to propagate single annotations in these families, often going up to a more general level of rRNA processing term.

ValWood commented 6 years ago

But a single HTP annotation would not be propagated via PAINT if not in combination with additional information would it?