PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Find Pathways (URIs) by member Entity URIs or IDs #288

Closed IgorRodchenkov closed 6 years ago

IgorRodchenkov commented 6 years ago

This (web service query) could be implemented in one of the following way, or the other, or both:

(Or, we could add a new WS endpoint.)

IgorRodchenkov commented 6 years ago

A PC search query, for entity type hits, returns an array of pathway URIs in the "pathway" filed (JSON or XML). Currently, we cannot "search" by URI (or part of it) even if we'd know some, because the Lucene index field "uri" is a StoredField (not indexed). So, here is an idea:

If we replace StoredField with StringField on this line, and re-index entire BioPAX model, then queries like http://www.pathwaycommons.org/pc2/search?q=uri:"http://pathwaycommons.org/pc2/Protein_0d4308790e68d98cdb1ce80c706e2e0e" will work and return at most one hit with all its pathway URIs (one could also submit a list of uris as q=uri:"A" uri:"B" uri:"C").

PS: However, traverse?uri=entityUri&uri=entityUri2... seems much easier to implement and does not require re-indexing.

@jvwong @d2fong @gbader

IgorRodchenkov commented 6 years ago

Alright, FYI: @d2fong, @ozgunbabur, @jvwong

PC (beta.pathwaycommons.org/pc2/ - PCv10 server) /search commad now understands queries like:

( - search?q=pathway:name_expr - was working even before the latest modifications; but it is fussy and somewhat confusing to use, e.g.: http://beta.pathwaycommons.org/pc2/search?q=pathway:*insulin*&type=control finds all Control type interactions that belong to any pathway (or sub-pathway of such) which name contains "insulin" - WOW, but...

Of course, it's still possible to submit boolean queries like ?q=pathway:"URI1" AND pathway:"URI1" (- find smth. that bolongs to both pathways) or ?q=uri:"URI1" OR uri:"URI1" (- find either or both of the two things by known URI), etc. - go ahead experimenting...)

URI or ID query values in uri: and pathway: fields are normally case-sensitive, but names are not.

You can also use only the "ID" (syffix) part of the URI(s) with these search fields, e.g., "R-HSA-201451" or "Protein_0d4308790e68d98cdb1ce80c706e2e0e" (just example - might not be there in PC10 db), etc. Double quotes are also important for these queries (because it does not make much sense to use tricky fussy search in uri: and pathway: fields, and quotation cancels special meaning of some symbols to the Lucene query parser)