Closed jeanetteclark closed 3 years ago
That's great, @jeanetteclark . I will give it a sanity check and submit a quick review ASAP.
I think we should refactor the tests to be aware of which sources are configured with keys, and skip testing for IDs that we don't have keys loaded for
I did this by adding a new function scythe_get_key
which looks for set keys both via keyring
and the env.var backdoor. I also added a column in the test data table that contains the source. So, the results from scythe_set_key
are used to filter the test table for sources we have keys for, and also to use in the new sources argument in citation_search
I think we should slightly refactor citation_search to give the user control over which sources are searched. I've been debating whether it would be better to default to just plos, or to inlcude the others that require an API key as well. The function signature would be something like citation_search(identifiers, sources=c("plos"))
Done. Default is to look across all 3 sources.
The citation_search should parallelize across the source searches if the user has the parallel package installed
Done - but I had to limit the number of cores used to 2 otherwise the package fails CMD checks. It barely gives us performance improvement though, since the PLOS call is by far the slowest (1 call every 6 seconds, vs 1 call/second, and 9 calls/second). Here are the performance results for 100 identifiers:
in parallel with 2 cores:
user system elapsed
8.877 4.259 669.064
in series:
user system elapsed
8.370 2.672 679.697
FYI I'm doing more testing on the parallel implementation. I didn't realize I had env var tokens set yesterday, which worked fine. It seems to be choking pretty hard on the interactive keyring password unlock when keys are set normally, so I'm trying to find a fix for that
@jeanetteclark I just looked through all of the issues, and successfully ran checks. So, this is ready to merge and I will go ahead and do so now. I had experimented with an alternative implementation of citation_search
that avoided hardcoding source names, which worked locally but failed on the GitHub actions check, so I reverted that in the interests of just getting this out. We can revisit why that failed later.
I think the package is in a state to be released at 0.9.0 as a minimum viable product.
Current functionality:
citation_search()
)citation_search_plos
,citation_search_scopus
,citation_search_springer
)scythe_set_key
)write_citation_pairs
)R CMD check is passing in GitHub Actions on Mac OS