How many names published in IPNI are available open access?
IPNI record DOIs against the publications in which new nomenclatural acts are found. This data element has been recorded since 2012.
graph TB
subgraph "Data access "
subgraph subgraph_padding_1 [ ]
style subgraph_padding_1 stroke-dasharray: 0 1
ipnidata["Download name publication <br>data from <b>IPNI</b>; extract DOIs"]
end
end
subgraph "Processing "
subgraph subgraph_padding_2 [ ]
style subgraph_padding_2 stroke-dasharray: 0 1
unpaywall["Lookup DOIs in <b>unpaywall</b>"]
ipnidata-->unpaywall
end
end
subgraph "Reporting "
subgraph subgraph_padding_3 [ ]
style subgraph_padding_3 stroke-dasharray: 0 1
rptoatakeup[Report on OA takeup over time]
rptoastatus[Report on OA statuses over time]
unpaywall-->rptoatakeup
unpaywall-->rptoastatus
end
end
The software is written in Python
and execution is managed with the build tool make
.
The command used to launch python is defined in the makefile as the variable python_launch_cmd
; on Windows (the default) the python
executable is prefixed with winpty
. Comment out this line of the Makefile if you are on Linux.
APIs are used to access IPNI data (via pykew
) and unpaywall (via the unpywall
package).
Software package dependencies are specified in requirements.txt
python -m venv env
source env/Scripts/activate
pip install -r requirements.txt
The Makefile
includes a year_min
variable which is passed to the getipninames.py
script to select the initial set of records. By default this is set to 2012
.
A report of the actions that will be taken to build a particular Makefile target can be seen by using the --dry-run
flag. For example to see the actions taken to process the reportoa
target use make reportoa --dry-run
.
The unpaywall lookup will take some time (several hours for datasets of thousands of records). The unpywall
utility offers a cache option which stores the results of a lookup and uses this local cache for subsequent requests. See more details here: https://unpywall.readthedocs.io/en/latest/cache.html. The cache file is named unpaywall_cache
and is specified in .gitignore
.
A complete run can be initiated with make all
or individual steps are detailed below.
getipninames.py
downloads/ipninames.csv
year_min
variable, IPNI names are downloaded using the pykew
API wrapper.make downloads/ipninames.csv
or the shorthand: make getnames
ipninames2oastatus.py
downloads/ipninames.csv
data/ipniname-oastatus.csv
unpywall
API wrapper and store the results in CSV format.make data/ipniname-oastatus.csv
or the shorthand: make getoastatus
reportoastatus.py
data/ipniname-oastatus.csv
data/ipniname-oastatus-report.csv
has_doi
), if the literature is available open access (is_oa
) and the open access status (oa_status
- green, gold, bronze, hybrid etc), and counting the size of each of the groups.make data/ipniname-oastatus-report.csv
or the shorthand: make reportoa
plotoa.py
data/ipniname-oastatus-report.csv
data/oatrend.png
make data/oatrend.png
or the shorthand: make plotoa
plotoatype.py
data/ipniname-oastatus-report.csv
data/oastatustrend.png
make data/oastatustrend.png
or the shorthand: make plotoastatus
Two utility make targets are provided for this:
make clean
- removes all processed files (ie the contents of the data
directory)make sterilise
- removes all processed files and all downloaded files (ie the contents of both the data
and downloads
directories)make all
make archive
Please use the github issue tracker associated with this project to report bugs and make feature requests.
Please link your commit message to an issue which describes what is being implemented or fixed.
Any new dependencies should be added to requirements.txt
and committed to git. The env
directory is specified in .gitignore
, please do not commit this to git.
The data
and download
directories are specified in .gitignore
, so please do not commit these, or any outputs such as data files / chart images to git. Instead you should:
Makefile
which will:
Similarly, the archive
directory is specified in .gitignore
, please do not commit this or any of its contents to git - instead follow the process laid out in the "How to archive an analysis" section above.
Nicky Nicolson, RBG Kew (n.nicolson@kew.org)