crew102 / wosr

R clients to the Web of Science and Incites APIs
https://crew102.github.io/wosr/
Other
30 stars 8 forks source link

all emails NA #16

Closed treeman21 closed 4 years ago

treeman21 commented 4 years ago

First, thanks for this very nice package. It very gives me almost exactly what I need, but all values in the email column of the author csv are coming back as NA. Any advice would be greatly appreciated. Here is my call: pull_wos(query = 'WC=("Biodiversity and Conservation" OR "Ecology" OR "Multidisciplinary Sciences") AND TS=("biodiversity") AND PY=(2010-2019)',editions = c("SCI"),sid = auth(username=NULL,password=NULL)) Thank you in advance for your help.

treeman21 commented 4 years ago

I should clarify that I just need the email(s) for the corresponding author(s) of each paper, which WoS gives in the address field. It might not be possible to associate each email with a name in your author table, which is okay. For my purposes, which involve sending a survey via email to a group of experts identified through a repeatable WoS search, I only need the list of emails for the corresponding author(s) of each paper, even if not matched with their names. In case it helps, the ex_email function in the qdapRegex package seems to provide one of several effective ways to extract the emails from the address field text. Thanks again for your help.

crew102 commented 4 years ago

Hi @treeman21 , thanks for bringing this to my attention. I actually don't have access to the WoS API anymore (I left my old job, where I had a set of credentials). I'd be happy to help fix this issue you can assist, though. Let me know if you're willing, and I'll send you some instructions for what exactly I'd need you to do.

treeman21 commented 4 years ago

I would be happy to assist. Thank you so much for the help!

On Nov 30, 2019, at 3:59 PM, Chris Baker notifications@github.com wrote:

 Hi @treeman21 , thanks for bringing this to my attention. I actually don't have access to the WoS API anymore (I left my old job, where I had a set of credentials). I'd be happy to help fix this issue you can assist, though. Let me know if you're willing, and I'll send you some instructions for what exactly I'd need you to do.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

crew102 commented 4 years ago

K, here's what I need you to do:

  1. Pull the source code for wosr from Github (e.g., in your terminal run git clone https://github.com/vt-arc/wosr.git)
  2. Assuming you're using RStudio, open up the wosr RStudio project.
  3. On line 93 in pull-wosr.R (https://github.com/vt-arc/wosr/blob/master/R/pull-wos.R#L93), insert the following line: saveRDS(all_resps, "all_resps.rds")
  4. Run devtools::load_all() in the console. If you don't have devtools installed, you'll need to install it with install.packages('devtools').
  5. Run the line that you mentioned, pull_wos(query = 'WC=("Biodiversity and Conservation" OR "Ecology" OR "Multidisciplinary Sciences") AND TS=("biodiversity") AND PY=(2010-2019)',editions = c("SCI"),sid = auth(username=NULL,password=NULL))
  6. all_resps.rds should now appear in the wosr source folder. Please upload it to this tread. Once I have that, I'll be able to debug the issue.
treeman21 commented 4 years ago

The rds file size for the original pull was too big to upload here, so I am attaching the zipped file for the following call: pull_wos(query = 'WC=("Multidisciplinary Sciences") AND TS=("biodiversity") AND PY=(2019) AND DT=("Article")',editions = c("SCI"),sid = auth(username=NULL,password=NULL))

all_resps.rds.zip

crew102 commented 4 years ago

Hey @treeman21 , so it looks like there isn't an email_addr element in the XML that the API serves anymore. It could be that all of the authors in your query don't have email addresses. If you know that some of them do (e.g., by confirming on the WoS web portal that at least one of the publications returned by your query has an author email on it), then it's an issue with the API. Otherwise, it could just be that all of the publications that are returned by your query don't have any author email addresses associated with them (which I kinda doubt). See the attached zip file for the XML that the API returns in response to your query.

wos-xml.xml.zip

treeman21 commented 4 years ago

Do you know whether the email_addr field pulled emails from the address field or whether this was an entirely separate field? The email address(es) for the corresponding author(s) are always included at the end of the address field when I view each individual publication online and when I export the reference to EndNote. Perhaps they changed from including this as a separate field and merged it with the address information? In any case, it looks like the email addresses could be isolated from the address field and associated with publications, even if not matched with individual authors on each publication. Is there any chance you could extract this information from the address field? Thanks again for your help!

crew102 commented 4 years ago

The email address nodes used to be located at the //summary/names/name/email_addr element, alongside display_name, first_name, last_name, etc. Based on the current documentation from Clarivate, it looks like the email address node no longer exists (e.g., see https://api.clarivate.com/swagger-ui/?url=https%3A%2F%2Fdeveloper.clarivate.com%2Fapis%2Fwos%2Fswagger and screenshot below). It appears as though email_addr is totally gone, and doesn't appear in either the /summary/names node or in any other nodes. You may want to contact Clarivate and ask what happened to this data, and if it's retrievable somehow. I've attached an sample of the XML for 100 of the publications returned by your query. I don't see email addresses in any of the XML.

Capture

wos-xml-sample.zip

treeman21 commented 4 years ago

Thanks for looking into this. I will let you know if I learn more.