con / tributors

Pay tribute to your contributors! A tool to automatically update contributor files.
https://con.github.io/tributors/
Apache License 2.0
13 stars 5 forks source link

zenodo: do not just return empty on "unauthorized" etc while doing orcid query #47

Closed yarikoptic closed 4 years ago

yarikoptic commented 4 years ago

I was trying to run

(git)lena:~datalad/datalad[add/tributors]git
$> tributors --log-level DEBUG update zenodo
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/datalad/datalad/contributors HTTP/1.1" 200 None
INFO:    zenodo:Updating .zenodo.json
INFO:    zenodo:Updating .tributors cache from .zenodo.json
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pub.orcid.org:443
DEBUG:urllib3.connectionpool:https://pub.orcid.org:443 "GET /v3.0/search/?q=given-names:+AND+family-name:glalteva HTTP/1.1" 401 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pub.orcid.org:443
DEBUG:urllib3.connectionpool:https://pub.orcid.org:443 "GET /v3.0/search/?q=given-names:glalteva+AND+family-name: HTTP/1.1" 401 None

to only realize that 401 is unauthorized, so it should have just bailed out right away announcing that I forgot to provide token. Code has

    if response.status_code != 200:
        return

so just returns upon any non-successfull attempt. It should be more particular IMHO. In the future might even want to retry on some server side fails (5xx code) etc.

yarikoptic commented 4 years ago
$> tributors --version                                             
0.0.18
yarikoptic commented 4 years ago

I jinxed it... If I reexport ORCID_SECRET and ORCID_ID (forgot where I stored the token). search query comes back with 500:

> /home/yoh/proj/tributors/tributors/main/orcid.py(143)record_search()
-> if response.status_code != 200:
(Pdb) p response.status_code
500
(Pdb) p response.text
'{"response-code":500,"developer-message":"org.apache.solr.client.solrj.impl.HttpSolrClient.RemoteSolrException Full validation error: Error from server at http://solr-loc.orcid.org/solr/profile: org.apache.solr.search.SyntaxError: Cannot parse \'given-names: AND family-name:glalteva\': Encountered \\" <AND> \\"AND \\"\\" at line 1, column 13.\\nWas expecting one of:\\n    <BAREOPER> ...\\n    \\"(\\" ...\\n    \\"*\\" ...\\n    <QUOTED> ...\\n    <TERM> ...\\n    <PREFIXTERM> ...\\n    <WILDTERM> ...\\n    <REGEXPTERM> ...\\n    \\"[\\" ...\\n    \\"{\\" ...\\n    <LPARAMS> ...\\n    \\"filter(\\" ...\\n    <NUMBER> ...","user-message":"Something went wrong in ORCID.","error-code":9008,"more-info":"https://members.orcid.org/api/resources/troubleshooting"}'

so it seems that syntax is incorrect.

yarikoptic commented 4 years ago

yeap, removing empty part, it works:

(Pdb) response = requests.get("https://pub.orcid.org/v3.0/search/?q=family-name:glalteva", headers={"Authorization": "bearer %s" % token, "Accept": "application/json"})
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pub.orcid.org:443
DEBUG:urllib3.connectionpool:https://pub.orcid.org:443 "GET /v3.0/search/?q=family-name:glalteva HTTP/1.1" 200 None
(Pdb) p response.content
b'{"result":null,"num-found":0}'

so please sanitize input into search... sorry for piling up unrelated issues into one. Ideally -- please add a unittest, so if there is token known, it would perform a number of queries for get_orcid for known entities where you know expected output. vcr library could be used to cache it locally at least

vsoch commented 4 years ago

The orcid search is so messy we shouldn't allow it to happen if there isn't a first and last name. I'll break in this case.

vsoch commented 4 years ago

See https://github.com/con/tributors/pull/48.