As discussed, I added some resources into the team-resources folder, specifically:
crossref_and_related.R - contains extract_crossref(), a function to extract crossref data if you do not have a doi (if you do have a doi, it is much easier, as shown in thet get_citation() function in there). The matching is currently based on exact text matching - probably something fuzzier would be better.
pdf_extraction.R implements one way to extract references from PDFs - note that this includes a Python function that you need to put into a different file and call through reticulate - below - or translate into R
wikidata_example.Rmd - just a simple Wikidata example - if you want to use Wikidata to identify university locations - could consider alternatives (e.g., OpenStreetMaps) - but they would seem to create problems with satelite campuses abroad which give universities addresses in multiple countries (I would presently ignore satelite campuses, unless you find a reliable way to differentiate them from the main university)
screen_matches.R - a basic Shiny app that you could build on - but there are probably better templates. Also, maybe this screen is close to something we need to include eventually to check dubious crossref matches (but that could also be a table, maybe borrowed from ASySD)
Packages to consider:
rcrossref for crossref API access
reticulate to use Python code within R (e.g., if there are self-contained functions within cleanBib) that you do not want to translate unnecessarily (most important function: source_python that just makes Python functions available to R)
bib2df to parse .bib reference files - that users can upload/provide directly
As discussed, I added some resources into the
team-resources
folder, specifically:crossref_and_related.R
- containsextract_crossref()
, a function to extract crossref data if you do not have a doi (if you do have a doi, it is much easier, as shown in thetget_citation()
function in there). The matching is currently based on exact text matching - probably something fuzzier would be better.pdf_extraction.R
implements one way to extract references from PDFs - note that this includes a Python function that you need to put into a different file and call through reticulate - below - or translate into Rwikidata_example.Rmd
- just a simple Wikidata example - if you want to use Wikidata to identify university locations - could consider alternatives (e.g., OpenStreetMaps) - but they would seem to create problems with satelite campuses abroad which give universities addresses in multiple countries (I would presently ignore satelite campuses, unless you find a reliable way to differentiate them from the main university)screen_matches.R
- a basic Shiny app that you could build on - but there are probably better templates. Also, maybe this screen is close to something we need to include eventually to check dubious crossref matches (but that could also be a table, maybe borrowed from ASySD)Packages to consider:
rcrossref
for crossref API accessreticulate
to use Python code within R (e.g., if there are self-contained functions within cleanBib) that you do not want to translate unnecessarily (most important function:source_python
that just makes Python functions available to R)bib2df
to parse .bib reference files - that users can upload/provide directly