DataONEorg / scythe

Scythe, the data citation harvester
Other
7 stars 2 forks source link

Version 0.9.0 #10

Closed jeanetteclark closed 3 years ago

jeanetteclark commented 3 years ago

I think the package is in a state to be released at 0.9.0 as a minimum viable product.

Current functionality:

R CMD check is passing in GitHub Actions on Mac OS

mbjones commented 3 years ago

That's great, @jeanetteclark . I will give it a sanity check and submit a quick review ASAP.

jeanetteclark commented 3 years ago

I think we should refactor the tests to be aware of which sources are configured with keys, and skip testing for IDs that we don't have keys loaded for

I did this by adding a new function scythe_get_key which looks for set keys both via keyring and the env.var backdoor. I also added a column in the test data table that contains the source. So, the results from scythe_set_key are used to filter the test table for sources we have keys for, and also to use in the new sources argument in citation_search

I think we should slightly refactor citation_search to give the user control over which sources are searched. I've been debating whether it would be better to default to just plos, or to inlcude the others that require an API key as well. The function signature would be something like citation_search(identifiers, sources=c("plos"))

Done. Default is to look across all 3 sources.

The citation_search should parallelize across the source searches if the user has the parallel package installed

Done - but I had to limit the number of cores used to 2 otherwise the package fails CMD checks. It barely gives us performance improvement though, since the PLOS call is by far the slowest (1 call every 6 seconds, vs 1 call/second, and 9 calls/second). Here are the performance results for 100 identifiers:

in parallel with 2 cores:

user  system elapsed 
8.877   4.259 669.064

in series:

user  system elapsed 
8.370   2.672 679.697
jeanetteclark commented 3 years ago

FYI I'm doing more testing on the parallel implementation. I didn't realize I had env var tokens set yesterday, which worked fine. It seems to be choking pretty hard on the interactive keyring password unlock when keys are set normally, so I'm trying to find a fix for that

mbjones commented 3 years ago

@jeanetteclark I just looked through all of the issues, and successfully ran checks. So, this is ready to merge and I will go ahead and do so now. I had experimented with an alternative implementation of citation_search that avoided hardcoding source names, which worked locally but failed on the GitHub actions check, so I reverted that in the interests of just getting this out. We can revisit why that failed later.