geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

Update URLs for Xenbase #1206

Open vanaukenk opened 4 years ago

vanaukenk commented 4 years ago

The current crop of URLs in the dbxefs.yaml file for Xenbase do not appear to be working.

@malcolmfisher103 may be able to help us get the correct ones?

We'd like to have link outs from the Noctua Form 2.0 autocomplete to Xenbase for Xenbase gene IDs.

malcolmfisher103 commented 4 years ago

I'll look into this.

Nothing looks obviously wrong at first glance, but maybe I am misinterpreting how the various fields are being used.

Is the autocomplete using the ID syntax regexes to get the IDs and substituting those into the given 'url_syntax:' format to generate appropriate URLS? Are the 'rdf_uri_prefix:' fields being used at all?

vanaukenk commented 4 years ago

Thanks, @malcolmfisher103 We've started another ticket in the amigo tracker to try to resolve this issue. We'll let you know if we need any more help from Xenbase.

malcolmfisher103 commented 4 years ago

Great.

Just FYI, I have discovered some irregularities with how the given URLs resolve currently -

The http://www.xenbase.org/common/xsearch.do?searchValue=[example_id] form is lacking some restrictions that can lead to issues.

With the restrictions added the form should be - http://www.xenbase.org/common/xsearch.do?exactSearch=true&searchIn=7&searchValue=[example_id].

kltm commented 4 years ago

Now: http://www.xenbase.org/common/xsearch.do?exactSearch=true&searchIn=7&searchValue=XB-GENE-6255962 http://www.xenbase.org/common/xsearch.do?exactSearch=true&searchIn=7&searchValue=[example_id]

kltm commented 4 years ago

Reopening as we have not "really" solved the issue here; see: https://github.com/geneontology/amigo/issues/581#issuecomment-548996135

malcolmfisher103 commented 4 years ago

Hi Seth, Any idea what is causing the problem? The re-formatted links work just using basic substitution. Does anything get parsed out by the processes ingesting the yaml that might conflict with the messy search query nature of our links? There is an identifiers.org referrer we could try instead, it has the syntax 'https://identifiers.org/xenbase:XB-GENE-[example_id]'. I had been avoiding it as it only works for genes not any of our other bioentities so our link syntaxes in the 'dbxref.yaml' file would be less consistent.

kltm commented 4 years ago

@malcolmfisher103 Quickly reviewing this item, I seemed to think that the core issue was as described here: https://github.com/geneontology/amigo/issues/578 Essentially, the way the CURIEs work is that a single "namespace" must unfold in a single step to a single URL scheme. I seemed to believe that you had different types of entities that required different URLs all in the same namespace (which goes against how CURIEs work). You seem to suggest something similar in your reason for not wanting to use identifiers.org. The solutions would be to either split your entities into different namespaces or have a single universal resolver at your end that can sort entities to the right locations.

malcolmfisher103 commented 4 years ago

OK, do I need something more like ZFIN's combined entry then?

-type_name: variation type_id: VariO:0001 id_syntax: ZDB-(GENE|GENO|MRPHLNO)-[0-9]{6}-[0-9]+ url_syntax: http://zfin.org/cgi-bin/ZFIN_jump?record=[example_id] example_id: ZFIN:ZDB-GENE-990415-103 example_url: http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-990415-103

The reason the identifiers org system doesn't work is that it ignores the 'XB-[content type]-' element of the ID and only uses the numeric section - it treats 'xenbase:6255962' as equivalent to 'xenbase:XB-GENE-6255962' so it won't work for anything not a gene.

As it stands in the db_'xrefs.yaml' all of the entries are resolving the same way except the genes which we changed in case that was causing this issue. They can all be changed to the updated 'http://www.xenbase.org/common/xsearch.do?exactSearch=true&searchIn=7&searchValue=[identifier]' syntax. Would that resolve the issue or is it simply having different types resolving with the same curie that is the problem, even if they resolve the same way?

kltm commented 4 years ago

@malcolmfisher103 identifiers.org and CURIEs (and how GO resolves URLs from CURIEs) in general just work like that on purpose. If one wants different mappings to URLs, that must be taken care of either by "namespace" (changing the namespace portion of the identifier to differentiate it) or with a mapper endpoint (like ZFIN does in your example above). For an odd historical reason, the syntax of the dbxrefs.yaml file allows for different mappings to be present, but only one can be obeyed and for anything that we have to resolve it needs to be unique. We could indeed try and use your exact search as the mapping endpoint, if you think that is the most expedient and useful. If so, I would be interested in knowing what the searchIn=7 is doing.

malcolmfisher103 commented 4 years ago

From a brief consultation it seems that the 'searchIn=7' element was supposed restrict the search to a gene/feature table in our database, it may not be necessary given the 'exactSearch' element. So if everything resolves using the same method can they still be kept as separate type_name entries or do I need to combine them into a generic syntax?

kltm commented 4 years ago

Cheers, just curious.

Either one is fine. Currently, the GO does not make use of the various entity type entries--that was driven by an EBI use case. The GO just takes one (the first?) and uses it. Whether that all are the same or combined into one does not make a mechanical difference for us, but you may have a preference for your own use and sanity...

malcolmfisher103 commented 4 years ago

I'll try and keep theme separate in case we need to use the individual types for something else, I known David mentioned possibly populating with/from fields from lookups at some future point.

My only concern is that when this ticket was originally raised all the URLs were of the same form, which makes me doubt that this could have been the initial problem.

Is it worth just removing all the non-gene entries just for now and see if that works?

kltm commented 4 years ago

@malcolmfisher103 Assuming my past self was correct on the issue, uniformly updating the metadata to a single form for all entity_types should fix it--this is an otherwise known issue. If not, we're going to have to start digging around for other possible causes. But either way, they should be uniform to prevent confusion as that is how the system should work.

kltm commented 4 years ago

I've added this to check for the next update.

kltm commented 4 years ago

@malcolmfisher103 It looks like these are now getting routed correctly in production AmiGO.

malcolmfisher103 commented 4 years ago

Thanks @kltm The links look good in AMIGO but I came across a handful of gene products which have XB-GENE IDs in place of symbols. I don't see how this could be a result of the issue addressed in this ticket, but I thought I would check before raising a new ticket. The problematic entries are the first results for the following search 'http://amigo.geneontology.org/amigo/search/bioentity?q=xb-gene'.

kltm commented 4 years ago

@malcolmfisher103 Using "XB-GENE-945791" as an example, the reason for "no symbol" is that the supplying GAF (paint_other_valid.gaf.gz) provides that as a symbol. This would be a data issue with PAINT.

malcolmfisher103 commented 4 years ago

@kltm While the links on Amigo are working the ones in Noctua still don't seem to link out. Are these waiting on the restart of some service or is there still an issue?

kltm commented 4 years ago

@tmushayahama Would it be possible for you to bump amigo2-instance-data to 2.5.13 on dev to see if we can clear this ticket (referencing geneontology/go-site#1206)?

Longer term, we may want to consider having people grab the most recent file directly on startup (http://snapshot.geneontology.org/metadata/db-xrefs.json) or adding this as something that can be delivered by barista(?). Something to ponder later on, especially as not all clients are necessarily our flavor of JS.

pgaudet commented 2 years ago

@kltm Should this be in theOngoing data QC and pipeline maintenance project?

kltm commented 2 years ago

@pgaudet I don't think it should be for the moment. It looks like the last item was for a Noctua update. I would be tempted to close this out and open a new issue in Noctua if this is still actually a problem (referencing here).