CrossRef / rest-api-doc

Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/
Other
721 stars 270 forks source link

Cursor disconnect RemoteSolrException Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search #490

Open WolfgangFahl opened 4 years ago

WolfgangFahl commented 4 years ago

427 already points to an issue with cursors. With my script:

`#

download from crossref RESTful API via cursor

# downloadWithCursor() { local l_rows="$1" local l_index="$2" local l_cursor="$3" target=$sampledir/crossref-$l_index.json src="https://api.crossref.org/types/proceedings/works?select=event,title,DOI&rows=$l_rows&cursor=$l_cursor" download $src $target }

#

get Crossref data

see also https://github.com/TIBHannover/confIDent-dataScraping

# getCrossRef() { rows=1000 index=1 totalRows=0

force while entry

total=$rows downloadWithCursor $rows $index "*" while [ $totalRows -lt $total ] do target=$sampledir/crossref-$index.json status=$(jq '.status' $target | tr -d '"') total=$(jq '.message["total-results"]' $target)

get and remove quotes from cursor

cursor=$(jq '.message["next-cursor"]' $target | tr -d '"')
startindex=$(jq '.message.query["start-index"]' $target)
perpage=$(jq '.message["items-per-page"]' $target)
index=$[$index+1]
if [ "$status" == "ok" ]
then
  totalRows=$[$totalRows+$rows]
else
  # force while exit
  totalRows=1
  total=0
  # remove invalid
  mv $target $target.err
fi
echo "status: $status index: $index $totalRows of $total startindex: $startindex perpage=$perpage cursor:$cursor"
if [ $totalRows -lt $total ]
then
  # wait a bit
  sleep 2
  downloadWithCursor $rows $index "$cursor"
fi

done cat $sampledir/crossref-*.json | jq .message.items[].title | cut -f2 -d'[' | cut -f2 -d'"' | grep -v "]" | tr -s '\n' > $sampledir/proceedings-crossref.txt } ` I run into a similar issue:

{
  "status": "error",
  "message-type": "exception",
  "message-version": "1.0.0",
  "message": {
    "name": "class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException",
    "description": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:\/\/mds3:8984\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ4 NDNi\/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xNzc1OC9laXJhaTU=",
    "message": "Error from server at http:\/\/mds3:8984\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ4 NDNi\/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xNzc1OC9laXJhaTU=",

jq . *.err | grep "search:" | cut -f7 -d:

gives me: value must either be '*' or the 'nextCursorMark' returned by a previous search

 AoJ7o 7Hk/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xMTQ1LzI1MzA1NDQ=",
 value must either be '*' or the 'nextCursorMark' returned by a previous search
 AoJ3pL 1svECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xMTQ1LzExMzg5NTM=",
 value must either be '*' or the 'nextCursorMark' returned by a previous search
 AoJ teyWtfECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4zMTE1LzEyMjU3MzM=",
 value must either be '*' or the 'nextCursorMark' returned by a previous search
 AoJx6NyU0 8CPwhodHRwOi8vZHguZG9pLm9yZy8xMC4xMDYxLzk3ODA3ODQ0ODEwMTE="

so i suspect the space in the token is the issue.

Please update the documentation of what kind of encoding you expect or better fix the upstream library to use tokens that need no encoding (do not use spaces). Also improving the error message and point to the FAQ would be helpful.

To close this issue please let me know whether my space assumption is right and replacing space with "+" will fix the problem.

WolfgangFahl commented 4 years ago

cursor=$(jq '.message["next-cursor"]' $target | tr -d '"' | python -c "import urllib.parse;print (urllib.parse.quote(input()))"

fixes the issue see