DataONEorg / scythe

Scythe, the data citation harvester
Other
7 stars 2 forks source link

how to know it's working #24

Open atn38 opened 3 years ago

atn38 commented 3 years ago

Hello scythe team,

Kudos and thanks for developing this! I've gotten around to trying it out, and given my unfamiliarity with the databases and their APIs, am not sure if it's working right. I wasn't able to find any citations for our site's data holdings, despite knowing that there are >4 (and finding them on the Scopus). Relevant points imo:

(Also I have access to the dataONE slack but can't find my login, happy to try and find it if that's a better place for these questions.)

jeanetteclark commented 3 years ago

Hi An! So happy to hear you are testing out the package. It is definitely in its infancy and we are looking forward to developing it more and making it more robust. I tested against your identifiers and also did not get any results. Can you send a list of known citation pairs (just a couple of examples, the dataset DOI and publication DOI) so that I can have a look to try and figure out why they aren't getting picked up?

Thanks!

atn38 commented 3 years ago

well duh!

this article https://doi.org/10.5194/bg-18-1203-2021

cites a bunch:

https://doi.org/10.6073/pasta/3475cdbb160a9f844aa5ede627c5f6fe https://doi.org/10.6073/pasta/ced2cedd430d430d9149b9d7f1919729 https://doi.org/10.6073/pasta/e0e71c2d59bf7b08928061f546be6a9a https://doi.org/10.6073/pasta/9305328d0f1ed28fbb2d7cf56c686786

this https://doi.org/10.3389/feart.2020.598933

cites this https://doi.org/10.6073/pasta/cc4d53a91ed873765224fcb6d09f5eb7

There's also placement of citation if that matters: my 1st example places full data citation in references, while the 2nd puts it in data availability.

jeanetteclark commented 3 years ago

Hey @atn38 - I'm trying to have a look at this now but the Scopus website is giving me a hard time. If you can jump on the DataONE slack to chat about it, that would be helpful! Otherwise I can send you an invite to the NCEAS slack

jeanetteclark commented 3 years ago

After doing some digging, I realized that passing a list of identifiers wasn't working as expected - only the first identifier was searched. I fixed this in c295a3b, and also made sure to drop empty rows from result data frames so that you don't end up with a bunch of empty rows in the result set.

@atn38 if you install from develop and see the results you are expecting, let me know and I'll close this issue. I might make a minor release to incorporate these bug fixes into main

mbjones commented 3 years ago

Thanks for the report @atn38, and the fix @jeanetteclark. Does this bug affect our previous searches that @MayaSamet executed, such that it might be useful to repeat them now that it is fixed?

jeanetteclark commented 3 years ago

@mbjones no it does not - the bug was introduced in my last development push about a month ago, towards the end of it

atn38 commented 3 years ago

@jeanetteclark, I ran it again from the develop branch and was able to run citation_search and have it return results, so hooray!

with that said, the function didn't pick up these two articles that AFAIK are indexed in scopus and do cite our data DOIs:

https://doi.org/10.3389/feart.2020.598933 https://doi.org/10.1029/2020GB006552

jeanetteclark commented 3 years ago

Thanks @atn38! I'll have to look into these missed results later - I need to focus on other things at the moment, but I'll keep this issue open, and if you want to poke around, like Matt mentioned you are welcome to submit a PR for any improvements!