ElsevierDev / elsapy

A Python module for use with Elsevier's APIs: Scopus, ScienceDirect, others.
http://api.elsevier.com
BSD 3-Clause "New" or "Revised" License
357 stars 141 forks source link

Pagination breaks when returned @next value contains a '+' #47

Closed sheth7 closed 4 years ago

sheth7 commented 4 years ago

I am working with the ScopusSearch API. My code uses the cursor and loops through result pages by extracting the '@next' key. However, this breaks when the @next key value contains a '+'.

I have checked that everywhere in my code, I am treating the cursor variable and updated cursor variable as str type.

Can you please suggest what may be happening?

I am attaching a few outputs for different queries so as to show you that it breaks only when there is a '+' in the @next key value. Output 1: Screen Shot 2020-01-24 at 1 22 34 PM

Output 2: Screen Shot 2020-01-24 at 1 21 40 PM

Output 3: Screen Shot 2020-01-24 at 1 22 16 PM

Thanks!

sheth7 commented 4 years ago

debugging credit - Dave Santucci from Scopus.

This problem is due to url encoding. If using the cursor parameter to navigate pages, it would be necessary to url encode the cursor string before passing it for the next query.

import urllib.parse as urlencode

Assuming the returned cursor link is stored in a cursor variable and your page return is stored in a page variable

next_cursor = urlencode.quote(cursor)

This should solve the problem.

Another potential way is to extract the next page URI and pass it directly in the next call. You can access this link as the value of page['search-results']['link']['@href'] if page['search-results']['link']['@ref'] == "next"

The correct code for this would be something like: next_URI = [page['search-results']['link']['@href'] if page['search-results']['link']['@ref'] == "next" for item in page['search-results']['link']]

FYI - I have not yet tested the code above.