geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Useful/intuitive fields are not included in some searches #92

Closed ValWood closed 10 years ago

ValWood commented 10 years ago

I used the gene products search with GO:0034613

I expected this to retrieve a list of gene products annotated with GO:0034613 (and descendents)

but this search retrieves only 65 results (and no fission yeast) so it does not seem to retrieve all gene products direct annotations

kltm commented 10 years ago

Is this what you're looking for?

http://amigo2.berkeleybop.org/cgi-bin/amigo2/amigo/search/bioentity?q=*:*&fq=regulates_closure_label:%22cellular%20protein%20localization%22&fq=taxon_closure_label:%22Schizosaccharomyces%20pombe%22&sfq=document_category:%22bioentity%22

The issue here is that your method of retrieval (using GO id GO:0034613) does not really capture anything useful since neither the search nor the facet search use the regulates_closure, just regulates_closure_label. Thus, a search for "cellular protein localization" likely is returning the correct answer.

We probably need to review what fields are in what search aspect. Looking around, we seem to consistently /not/ include id closures in cases like this, probably because they do not show up in any aspect of the user interface.

kltm commented 10 years ago

The ideal solution might be to have fields that can be dynamically decided on by the user, and which gives hints in the UI as to which have matched. This is a rather large enhancement however.

kltm commented 10 years ago

The simplest solution would be to include the ID closures as well as the label closures. I need to try and remember if there is something wrong with that.

kltm commented 10 years ago

A little chat with Chris. The easiest thing to do here would be to just add the regulates_closure field to the search (boost_weight) so that there was some sensible activity when somebody puts in an ID, even though there would be no visual cue as to what was happening. The direct input of IDs like this is probably from internal/consortium members, so we could consider it a little like a power feature.

ValWood commented 10 years ago

This query http://amigo2.berkeleybop.org/cgi-bin/amigo2/amigo/search/bioentity?q=*:*&fq=regulates_closure_label:%22cellular%20protein%20localization%22&fq=taxon_closure_label:%22Schizosaccharomyces%20pombe%22&sfq=document_category:%22bioentity%22

is partway what I want. This gives me the list of genes annotated to "cellular protein localization" and its children. However, I want to retrieve the actual annotations to "cellular protein localization" (with extensions), (GAF file format) not the gene products products......

How do i do this query?

Also, I'm sure that searching on an ID should behave exactly the same as a search on a term name. I don't think this is a 'power feature'...Many users 'will come to AmiGO with a GO ID from another source, because it is easier to cut and paste an ID then a long term name. For example, the PomBase user who prompted this query says " I downloaded the text file, and searched for all instances of: 'GO:0034613'"

kltm commented 10 years ago

We have three aspects currently available in the search interfaces: annotation, term, and gene product. Since you requested "a list of gene products annotated with GO:0034613 (and descendents)", that would would be a gene product query. However, it looks like what your actually interested in is the annotations, which would be a different aspect--you'd do essentially the same thing, except using the annotation aspect:

[Search] > [Annotations] cellular protein localization [Inferred annotation] > "cellular protein localization" (clear free-text filtering using [X]) [Taxon] > "Fungi" [Taxon] > "Schizosaccharomyces pombe"

http://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=regulates_closure_label:%22cellular%20protein%20localization%22&fq=taxon_closure_label:%22Schizosaccharomyces%20pombe%22&sfq=document_category:%22annotation%22

kltm commented 10 years ago

Our intention is to add the ids to the general search on each aspect.

ValWood commented 10 years ago

Yes that's the query I wanted, and I see that I can download the GAF from here. BUT a bit of the GAF is missing: No relation is written out for the annotation extension...see e.gs

PomBase SPCC1682.04 cdc31 GO:0034613 PMID:12857865 IMP P centrin protein NCBITaxon:4896 20050715 PomBase PomBase:SPBC12D12.01

PomBase SPAC20G4.07c sts1 GO:0071210 PMID:18310029 IMP P C-24(28) sterol reductase Sts1 erg4 protein NCBITaxon:4896 20091124 PomBase PomBase:SPAC1071.10c

kltm commented 10 years ago

Probably an issue along the lines of #65

kltm commented 10 years ago

The action on this is to include the closure fields in the searches even though they do /not/ have any UI expression.

kltm commented 10 years ago

Changes: ont-config.yaml: regulates_closure^1.0 regulates_closure_label^1.0 bio-config.yaml: regulates_closure^1.0 ann-config.yaml: regulates_closure^1.0 regulates_closure_label^1.0

kltm commented 10 years ago

Keep in mind that this was once upon a time considered to be a /bad/ thing, but reconsidered due to Val's input.