geneontology / geneontology.github.io

Repository for storing GO documentation, directly available through the general GO site
http://geneontology.org
MIT License
5 stars 10 forks source link

SPARQL doc page geneontology.org/docs/sparql #269

Open lpalbou opened 3 years ago

lpalbou commented 3 years ago

From https://github.com/geneontology/geneontology.github.io/issues/267#issuecomment-754567359 from @pgaudet :

Comment 1: I think the examples are wrong here ?

Example 1 : sequence specific DNA binding should be part_of DNA binding transcription factor activity, shouldn't it ?

Example 2: the DNA binding transcription factor should he linked to the downstram activities by 'positively regulates'

Right @vanaukenk @thomaspd ? Thanks, Pascale

From https://github.com/geneontology/geneontology.github.io/issues/267#issuecomment-755437353 from @lpalbou :

Thanks for the feedback.

sequence specific DNA binding should be part_of DNA binding transcription factor activity, shouldn't it ?

I don't think that's right. First, I am not sure we allow an activity (sequence specific DNA binding) to be part of another activity (DNA binding transcription factor activity): go-cam-shapes. The usual pattern is more activity part of BP. Second, the specific binding to DNA indeed triggers the transcription factor activity; without it, there is no transcription, so I do see this as a causal relationship and part_of is not a causal. As a note, my thesis with Dino was on nuclear receptors such as RXR, RAR, VDR, GCR & co..

Are we set on using Hsap and Cele etc to describe species ? Again this is a non-standard, non-intuitive representation.

I would prefer the standard uniprot convention too, I think we discussed it once. But it's a larger issue independent of this page as this comes from noctua graph and our triplestore (and probably affects quite a lot of other resources). Maybe create a separate project or at least a ticket on minerva repo ?

In the Table describing the relations, you could simplify by removing the 'Description' link and making the relations themselves clickable ?

I don't have a strong opinion on this, if you think that's more readable, I can change it. My intent was to make the description explicitly visible/accessible. Just note that not all IRIs resolve to a web page (here those do), they are just identifiers.

'part_of' is a BFO term but is also present in RO - maybe 'occurs' in can also be added to RO, and in this case we may be able to claim we are using a single ontology ?

Currently, all the part_of in GO-CAMs refers to BFO, not RO, so this documentation has to reflect that so that users can create valid queries. In the ontology world, I don't know if that's better to state that we are using a single ontology ? part_of should probably never be in 2 ontologies in the first place, unless we mean something different.

pgaudet commented 3 years ago

Tagging @vanaukenk and @ukemi who have volunteered to provide new examples.

pgaudet commented 3 years ago

About species names: Should we have a discussion about this ? @lpalbou Are you not developing a new viewer targeted at users - maybe this should be done for the new viewer, and we can keep doing what we are doing for the curation tool ?

It would be nice to know what we want to do before opening a ticket.

Thanks, Pascale

tmushayahama commented 3 years ago

btw, just for info, another way to generate interactive SPARQL examples is using the search api on landing page by click of a button, Ben has provided what sparql query was used for any search, so this might be helpful to get more dynamic examples @vanaukenk @ukemi

image

For example selecting production models with "species: homo sapien created on 2021-01-20 results is search api query on production server http://barista.berkeleybop.org/search/models?offset=0&limit=50&exactdate=2021-01-20&taxon=NCBITaxon:9606&debug gives back sparql query and search results

tagging @lpalbou @cmungall @balhoff

lpalbou commented 3 years ago

I linked the SPARQL documentation page from the Tools & Guide page and is therefore now accessible from the GO site (note it was already accessible through the SPARQL endpoint URL provided in the GO NAR article): https://github.com/geneontology/geneontology.github.io/pull/280

Discussion about species short names have to be handled at the GO project level as the SPARQL endpoint and the various UIs only display the information they receive from Minerva. Quick fixes on the UI side are possible if needed but not recommended as they would easily introduce discrepancies/inconsistencies between the various GO pages & solutions. If things are to change for species, please create the appropriate project and tickets.

@vanaukenk please ping me if you wish to change the examples in the SPARQL doc page, however this documentation is aimed at developers to understand the underlying data model (RDF, OWL), associated file system (TTL, triple store) and SPARQL language & endpoint. In essence, to teach how to create queries, independently of what the current GO-CAM curation best practices are, which will certainly continue to evolve over time. Since rewriting this in-depth documentation do take time, I would recommend to leave the examples as they are as they do serve their purposes: teaching how to query GO-CAMs. If you agree, I will close this ticket.

pgaudet commented 3 years ago

Hi @lpalbou Where is it accessible from ? I cannot find the link. I would expect it to be in 'tools' http://geneontology.org/docs/tools-overview/

We really need to change the example, if you still have transcription. I updated the template I had from a couple of year ago, http://noctua.berkeleybop.org/editor/graph/gomodel:59bee34700000179?model_id=gomodel:59bee34700000179

This is consistent with the papers we are publishing with the GREEKC consortium. Please use this model.

Thanks, Pascale

cmungall commented 3 years ago

I don't understand what the action should be for this ticket or who needs to be involved. Consider closing it and either (a) making multiple smaller actionable ticket or (b) make a superticket (see the GO github guide)

It seems this is mostly a ticket about content on a page somewhere? Consider making the first comment in an issue be a broad description of the problem.

Species name: I agree with @lpalbou, let us not overload this ticket. The 4 letter codes may or may not be a good idea. But if this is to be fixed, it should be fixed globally. In this case, the 4 letter codes are inserted as part of the neo build.

RO vs BFO: all our relations are in RO. Some have a BFO prefix, but they are in RO. See RO Docs. Yes, this is objectively very confusing for many people, not just GO users. But let's not try and solve that problem here.

lpalbou commented 3 years ago

The SPARQL endpoint is referenced from the API section, the http://sparql.geneontology.org, and the GO search.

@cmungall the proposal is to rewrite half of the technical SPARQL documentation with a GO-CAM that would better reflect newer curation practices (e.g. https://github.com/geneontology/go-shapes/pull/256). For the moment, the documentation uses a GO-CAM in production to illustrate how the data is linked from TTL to Triple Store, visualization and SPARQL.

The model suggested above is not in production and not valid according to shex, so this ticket is pending for an appropriate model:

Screen Shot 2021-02-25 at 2 57 26 PM

In addition, rather than rewriting half of a very detailed technical documentation that still serve its teaching purpose to the bioinformatic community, I would favor instead creating and maintaining interactive notebooks and make a better GO API.

lpalbou commented 3 years ago

I looked a bit more at the suggested model: http://noctua.geneontology.org/editor/graph/gomodel:59bee34700000179

Couple of issues unfolding:

pgaudet commented 3 years ago

Thanks for looking into that. I dont think I have ever put one of my models in production !

Thanks, Pascale