CrossRef / rest-api-doc

Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/
Other
733 stars 269 forks source link

HttpSolrClient$RemoteSolrException: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' #236

Closed martin-amaya-olx closed 7 years ago

martin-amaya-olx commented 7 years ago

Guys, sometimes I get following exception when using cursors

{
  "status": "error",
  "message-type": "exception",
  "message-version": "1.0.0",
  "message": {
    "name": "class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException",
    "description": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:\/\/mds4:8983\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ43IS53pACPyBodHRwOi8vZHguZG9pLm9yZy8xMC4xMTMwLzAwMTYtNzYwNigxOTc0KTg1PDgzOTpjY3Nhc2k Mi4wLmNvOzI=",
    "message": "Error from server at http:\/\/mds4:8983\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ43IS53pACPyBodHRwOi8vZHguZG9pLm9yZy8xMC4xMTMwLzAwMTYtNzYwNigxOTc0KTg1PDgzOTpjY3Nhc2k Mi4wLmNvOzI=",
    "cause": null
  }
}

For example with with this cursor: "AoMFAAAAAAAAAAB467m\u002FtNwCKjEwMDk1NjMxOTc".

Could you tell why solr generates cursors with special characters? I doesn't make sense. I mean, in order to avoid issues, the cursor should be only be created with alphabetic characters.

regards.

eltermann commented 7 years ago

@martin-amaya-olx, cursors are not supposed to be alphanumeric. You can expect other characters.

In your case, you are not encoding a plus sign ("+") and it is being replaced by a blank space.

Original token: "AoJ43IS53pACPyBodHRwOi8vZHguZG9pLm9yZy8xMC4xMTMwLzAwMTYtNzYwNigxOTc0KTg1PDgzOTpjY3Nhc2k+Mi4wLmNvOzI="

What Solr is receiving: "AoJ43IS53pACPyBodHRwOi8vZHguZG9pLm9yZy8xMC4xMTMwLzAwMTYtNzYwNigxOTc0KTg1PDgzOTpjY3Nhc2k Mi4wLmNvOzI="

You can get rid of it by encoding your URL. In python, this should look like the following:

import urllib
url = "https://api.crossref.org/yourendpoint?yourparameters&cursor=" + urllib.quote_plus(cursor)