PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

/traverse, /search sometimes return JSON result instead XML by default (caching?) #304

Closed IgorRodchenkov closed 5 years ago

IgorRodchenkov commented 5 years ago

(This was originally reported by @cannin via email)

Interesting...

Normally, one can (and should) always use HTTP header 'Accept' in requests, e.g.:

curl -H 'Accept: application/json' -XGET "http://www.pathwaycommons.org/pc2/traverse?uri=http://identifiers.org/uniprot/P38398&path=ProteinReference/organism/displayName"

curl -H 'Accept: application/xml' -XGET "http://www.pathwaycommons.org/pc2/traverse?uri=http://identifiers.org/uniprot/P38398&path=ProteinReference/organism/displayName"

But looks this is currently indeed working weird (e.g., try it from different IPs in different order :)); so I suspect a NGINX caching issue here. Prove - using standard cache-control header value or 'nocache' parameter it works:

curl -H 'Accept: application/xml' -H 'Cache-Control: nocache' -XGET "http://www.pathwaycommons.org/pc2/traverse?uri=http://identifiers.org/uniprot/P38398&path=ProteinReference/organism/displayName"

curl -H 'Accept: application/xml' XGET "http://www.pathwaycommons.org/pc2/traverse?uri=http://identifiers.org/uniprot/P38398&path=ProteinReference/organism/displayName&nocache=1" (- using 'nocache' parameter with some 'true' value)

curl -H 'Accept: application/json' -H 'Cache-Control: nocache' -XGET "http://www.pathwaycommons.org/pc2/traverse?uri=http://identifiers.org/uniprot/P38398&path=ProteinReference/organism/displayName"

However, it's been always possible to force to use the extension-based content negotiation approach with PC2 search, top_pathways and traverse services like this:

curl -XGET "http://www.pathwaycommons.org/pc2/traverse.xml?uri=http://identifiers.org/uniprot/P38398&path=ProteinReference/organism/displayName"

-note "traverse.xml" there (i.e., traverse, traverse.json, traverse.xml requests get cached there separately, and so - no mess).

PS: I would not recommend abusing cache-control and nocache options too much (definitely not with /get, /graph queries)

cannin commented 5 years ago

Thanks. This is helpful information. Is it documented on the PC site anywhere? Or a link to to this issue?

IgorRodchenkov commented 5 years ago

... strange comment

IR

On Aug 29, 2018, at 2:48 PM, Augustin Luna notifications@github.com wrote:

Thanks. This is helpful information. Is it documented on the PC site anywhere? Or a link to to this issue?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

cannin commented 5 years ago

Let's add some detail then. The documentation for the Search function states:

... which can be requested by using '.json' (e.g. '/search.json') or '.xml' extension/suffix or via HTTP request header 'Accept: application/json' (or application/json).

The traverse function (and others if applicable) has no equivalent text.

IgorRodchenkov commented 5 years ago

yeah - typo - should read ...(or application/xml)

On Thu, Aug 30, 2018 at 3:58 PM Augustin Luna notifications@github.com wrote:

Let's add some detail then. The documentation for the Search function states:

... which can be requested by using '.json' (e.g. '/search.json') or '.xml' extension/suffix or via HTTP request header 'Accept: application/json' (or application/json).

The traverse function (and others if applicable) has no equivalent text.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/PathwayCommons/cpath2/issues/304#issuecomment-417448174, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8fwTfmrWnJTvQ-voGiQ1s2czsc9cqXks5uWEP4gaJpZM4WOiOZ .

IgorRodchenkov commented 5 years ago

After experimenting (with 'Vary'), I finally found the nginx configuration which works:

#separate PC2 xml from json results when header-based content negotiatiion is used 
map $http_accept $act {
  default "";
  "application/json" ".json";
  "application/xml" ".xml";
}
proxy_cache_key $host$request_uri$request_body$act;

and should not worsen other things.