adsabs / adsabs-dev-api

Developer API service description and example client code
163 stars 58 forks source link

Punctuation characters in DOI trigger syntax error #43

Closed ms609 closed 6 years ago

ms609 commented 6 years ago

Possibly related to #15? Passing a DOI that contains parentheses causes a syntax error.

curl -H %TOKEN% "https://api.adsabs.harvard.edu/v1/search/query?data_type=XML&q=doi:10.1206/0003-0082(2008)3610[1:nrofwf]2.0.co;2" fails (as expected: not URL-encoded).

curl -H %TOKEN% "https://api.adsabs.harvard.edu/v1/search/query?data_type=XML&q=doi:10.1206%2F0003-0082%282008%293610%5B1%3Anrofwf%5D2.0.co%3B2" fails with error:

INVALID_SYNTAX_CANNOT_PARSE: Syntax Error, cannot parse doi:10.1206/0003-0082(2008)3610[1:nrofwf]2.0.co;2: The parser reported a syntax error, antlrqueryparser hates errors! (Not expected: doi=10.1038%2Fnature09068, with / urlencoded, is successful)

Double-encoding the doi: curl -H %TOKEN% "https://api.adsabs.harvard.edu/v1/search/query?data_type=XML&q=doi:10.1206%252F0003-0082%25282008%25293610%255B1%253Anrofwf%255D2.0.co%253B2" returns a response header, but with the DOI URLencoded (and thus no search results).

aaccomazzi commented 6 years ago

Wow, that is one ugly DOI!

Solution to the problem (and general recommended practice) is to wrap your search token in double quotes, i.e. use this for search string: doi:"10.1206/0003-0082(2008)3610[1:nrofwf]2.0.co;2" You can test this using our ui to verify that you no longer get an error from the parser: https://ui.adsabs.harvard.edu/#search/q=doi%3A%2210.1206%2F0003-0082(2008)3610%5B1%3Anrofwf%5D2.0.co%3B2%22&sort=date%20desc%2C%20bibcode%20desc&p_=0

Turning this into curl with proper url-encoding: curl -H %TOKEN% 'https://api.adsabs.harvard.edu/v1/search/query?q=%22doi:10.1206%2F0003-0082%282008%293610%5B1%3Anrofwf%5D2.0.co%3B2%22'

Incidentally, the data_type=XML parameter can be dropped since it is ignored (all results come back as JSON from this API)