Closed dobsonk2 closed 3 years ago
Hi Kara - sounds like it could have to do with a connection error to the online resource, but I'm not sure. @billspat can probably figure this out
HI Kara - I will be able to take a look at this on Monday. I don't think it's a connection error, but something related to how request is being formed by the scopus library, or perhaps the DOI needs to be in some funky format to work. I'll dig into the source code and figure this out.
@dobsonk2 : is the api key in the code one that you applied for from scopus, or did you get it from somewhere else?
when you applied for an API key, what did you put for the Website URL?
Are you signed into the MSU VPN when you are running your code?
The section "API Keys and IP Addresses" in the document https://cran.r-project.org/web/packages/rscopus/vignettes/api_key.html says you must be using the VPN or a computer on campus to use it.
I have not had a chance to try it yet but I have an API key to try.
Also I will add code to hide the API key as it should not go into a github repository (even a private one).
@billspat 1. I got the API key through Elsevier (https://dev.elsevier.com/)
I used http://example.com for my website, which they said was okay to use if you didn't have a website
No - I'll try to find those instructions for using the MSU VPN and get back to you on whether that works or not
Also - I'll be sure to update my code so that my API key doesn't show :)
@billspat I signed into the MSU VPN and tried my same code, but I still got the same error. I'm using the browser client, not the thick client. I'm not entirely sure the difference between the two - regardless, I'll keep trying to see if I can get it to work. Thanks for your help with this!
I've looked into this more, and I couldn't get it to work,
I am also use the API ('fat client' which runs on the mac instead of in the browser) and that may be part of the rpoblem, but I don't know. The VPN has changed recently and it may not function properly with Scopus. One way around that is to try using the HPCC, which is on campus, and I'll try that.
However after digging in, I see the R library is creating malformed URLs which is why you get the 404 (resource not found) error. I'm sorry I suggested this R library 'rscopus' https://github.com/muschellij2/rscopus which says it hasn't been updated since 2018
I think that Elsevier now has a new API as of 2018, so the library may not be using the correct API. I'm digging into the Elsevier documentation and it's pretty terrible. The scopus docs don't give any really examples, and I think you need to know how to use the generic Elsevier api and then apply the scopus info? I wrote to their help desk and we'll see what I get back.
Using an API for this great resource is definitely the way to go, but we need more information and better documentation. If @plzmsu thinks Scopus is a worthwhile resource for this then I'll pursue trying to get help using it, so don't waste your time until we learn more!
the original reason to use Scopus was because Web of Science and other options had a limit on how many PDFs you could download. Scopus didn't. We did the downloading manually, saved all the PDFs on the HPCC, and then did the R text mining. I think also that Scopus allowed you to search the entire text rather than just the abstract, though I may be misremembering. I am indifferent about Scopus vs. WOS or something else - it just needs to allow us to do a pretty thorough search and download all the PDFs.
@dobsonk2 : since you've manually downloaded the papers, and it seems like the Scopus API may be more work than it's worth, do you want to close this issue?
Here's the script I created to attempt to download files from Scopus via their API: https://github.com/SpaCE-Lab-MSU/warmXtrophic/blob/master/kara/Scopus_XML_download.R
I'm running into issues when trying to import the files. I selected a random paper from my Scopus search, and I've attempted to retrieve the article with the two methods below:
Both of these give this message:
This happens for every doi I attempt it with, as well as with eid identifiers.
I tried troubleshooting the issue online, but no luck. If its easier to try to find a solution, we could always discuss this in our weekly meeting or schedule another data-focused meeting.