eblondel / zen4R

zen4R - R Interface to Zenodo REST API
https://github.com/eblondel/zen4R/wiki
Other
44 stars 14 forks source link

getRecords ElasticSearch query returns error if it includes spaces #75

Closed irmoodie closed 2 years ago

irmoodie commented 2 years ago

Issue:

Using an ElasticSearch query with getRecords produces an error if the string contains a space.

How to reproduce:

library(zen4R)
zenodo <- ZenodoManager$new()
my_zenodo_records <- zenodo$getRecords(q = "test issue")

Returns:

> my_zenodo_records <- zenodo$getRecords(q = "test issue")
Error: lexical error: invalid char in json text.
                                       <html><body><h1>400 Bad request
                     (right here) ------^

If I instead do a search without spaces (e.g. q = “test”) I get the expected results, a list of objects with type = ZenodoRecord. Is there something I'm missing here? Should the space be formatted differently?

I’ve included my session information below.

Thanks!

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] zen4R_0.6

loaded via a namespace (and not attached):
[1] xml2_1.3.3       knitr_1.38       magrittr_2.0.3  
 [4] tidyselect_1.1.2 R6_2.5.1         rlang_1.0.2     
 [7] fastmap_1.1.0    fansi_1.0.3      httr_1.4.2      
[10] dplyr_1.0.8      tools_4.1.2      xfun_0.30       
[13] utf8_1.2.2       cli_3.3.0        DBI_1.1.2       
[16] htmltools_0.5.2  ellipsis_0.3.2   yaml_2.3.5      
[19] assertthat_0.2.1 digest_0.6.29    tibble_3.1.6    
[22] lifecycle_1.0.1  crayon_1.5.1     keyring_1.3.0   
[25] purrr_0.3.4      vctrs_0.4.1      curl_4.3.2      
[28] glue_1.6.2       evaluate_0.15    rmarkdown_2.11  
[31] compiler_4.1.2   pillar_1.7.0     generics_0.1.2  
[34] jsonlite_1.8.0   pkgconfig_2.0.3
eblondel commented 2 years ago

@irmoodie if you can re-install from Github, i've URL-encoded the query, so it should work now.

irmoodie commented 2 years ago

@eblondel I've re-installed the package from Github, but I still receive the same error using the code I posted above. I've also tried with a fresh R install on a Linux machine, however the same error is given if the search query contains a space. Let me know if there's something else I can try to troubleshoot this.

To reproduce:

install.packages("remotes")
remotes::install_github("eblondel/zen4R")
library(zen4R)
zenodo <- ZenodoManager$new()
my_zenodo_records <- zenodo$getRecords(q = "test search")

Returns:

Error: lexical error: invalid char in json text.
                                       <html><body><h1>400 Bad request
                     (right here) ------^

Linux session info:

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8      
 [8] LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] zen4R_0.6

loaded via a namespace (and not attached):
 [1] httr_1.4.2       compiler_4.1.3   keyring_1.3.0    assertthat_0.2.1 R6_2.5.1         tools_4.1.3      curl_4.3.2       remotes_2.4.2    xml2_1.3.3      
[10] jsonlite_1.8.0  

Windows session info:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] zen4R_0.6

loaded via a namespace (and not attached):
[1] httr_1.4.2       compiler_4.1.2   keyring_1.3.0    assertthat_0.2.1 R6_2.5.1        
[6] tools_4.1.2      curl_4.3.2       xml2_1.3.3       jsonlite_1.8.0  
eblondel commented 2 years ago

can you enable the logger with ZenodoManager, so we can see the request that is sent to Zenodo?

zenodo <- ZenodoManager$new(logger = "DEBUG")
irmoodie commented 2 years ago

@eblondel Sure, here's the output:

> zenodo <- ZenodoManager$new(logger = "DEBUG")
> my_zenodo_records <- zenodo$getRecords(q = "test search")
[zen4R][INFO] ZenodoRequest - Fetching https://zenodo.org/api/records/?q=test%20search&size=10&page=1 
-> GET /api/records/?q=test%20search&size=10&page=1 HTTP/1.1
-> Host: zenodo.org
-> User-Agent: libcurl/7.64.1 r-curl/4.3.2 httr/1.4.2
-> Accept-Encoding: deflate, gzip
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: Bearer 
-> 
<- HTTP/1.1 200 OK
<- Server: nginx
<- Date: Tue, 26 Apr 2022 14:34:03 GMT
<- Content-Type: application/json
<- Transfer-Encoding: chunked
<- Vary: Accept-Encoding
<- Link: <https://zenodo.org/api/records/?sort=bestmatch&q=test+search&page=1&size=10>; rel="self", <https://zenodo.org/api/records/?sort=bestmatch&q=test+search&page=2&size=10>; rel="next"
<- X-RateLimit-Limit: 60
<- X-RateLimit-Remaining: 59
<- X-RateLimit-Reset: 1650983704
<- Retry-After: 60
<- X-Frame-Options: sameorigin
<- X-XSS-Protection: 1; mode=block
<- X-Content-Type-Options: nosniff
<- Strict-Transport-Security: max-age=0
<- Referrer-Policy: strict-origin-when-cross-origin
<- Access-Control-Allow-Origin: *
<- Access-Control-Expose-Headers: Content-Type, ETag, Link, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
<- X-Request-ID: dfb8bc9b84a6a86f324649fd1bbdaa6b
<- Content-Encoding: gzip
<- 
[zen4R][INFO] ZenodoManager - Successfully fetched list of published records - page 1 
[zen4R][INFO] ZenodoRequest - Fetching https://zenodo.org/api/records/?q=test search&size=10&page=2 
-> GET /api/records/?q=test search&size=10&page=2 HTTP/1.1
-> Host: zenodo.org
-> User-Agent: libcurl/7.64.1 r-curl/4.3.2 httr/1.4.2
-> Accept-Encoding: deflate, gzip
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: Bearer 
-> 
<- HTTP/1.0 400 Bad request
<- Cache-Control: no-cache
<- Connection: close
<- Content-Type: text/html
<- 
Error: lexical error: invalid char in json text.
                                       <html><body><h1>400 Bad request
                     (right here) ------^
> 
eblondel commented 2 years ago

Ok i see, I forgot one url encoding when paging the getRecords. Re-install now, it should be ok this time :-)

irmoodie commented 2 years ago

@eblondel Solved! Thank you for your help and for the package!

eblondel commented 2 years ago

you are welcome