dhicks / comp-HOPOS

Building a comprehensive* dataset of 20th century philosophy of science
0 stars 0 forks source link

DOI

This repository contains the complete scripts and data files used to construct the Computational History of Philosophy of Science (Comp HOPOS) dataset. This readme file contains information on reproducing the dataset (and modifying the scripts for other purposes); a license statement; a brief overview of the method by which the dataset is constructed; and an overview of the files included in the repository.

The dataset can be downloaded at https://doi.org/10.5281/zenodo.1400633. The downloads include a data dictionary. A paper describing the motivations and construction method in more detail is available at .

These scripts were developed by Daniel J. Hicks, Rick Morris, and Evelyn Brister. The repository is maintained by Daniel J. Hicks. To report errors or other issues, please use the issue tracker for this repository (preferred) or email hicks.daniel.j@gmail.com.

Reproducibility

In principle, the dataset can be reproduced by running the scripts (the R files) in numerical order. There are, of course, a number of further details and complications.

Be nice and share your email with Crossref

The Crossref team encourage requests with appropriate contact information and will forward you to a dedicated API cluster for improved performance when you share your email address with them. https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service

To pass your email address to Crossref via this client, simply store it as environment variable in .Renviron like this:

  1. Open file: file.edit("~/.Renviron")
  2. Add email address to be shared with Crossref crossref_email = name@example.com
  3. Save the file and restart your R session

Don’t wanna share your email any longer? Simply delete it from ~/.Renviron

Copyright and License

Copyright (c) 2018 Daniel J. Hicks, Rick Morris, and Evelyn Brister

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Dataset Construction

The primary source of data for the Comp HOPOS dataset is CrossRef, which maintains registration records for the (vast) majority of digital object identifiers (DOIs). Recently, many scholarly publishers have been "minting" DOIs for their archives; combined with an elegant R API, this makes it possible to easily and rapidly retrieve a complete set of metadata records for many scholarly journals.

Two other sources of data are incorporated into the Comp HOPOS dataset. Chapter-level DOIs for the Boston and Western Ontario book series are scraped from search results in Springer's public search engine. CrossRef is then used to retrieve the metadata for a chapter in a standard format. For the Minnesota book series, Evelyn Brister retrieved the data from the University of Minnesota website manually, with assistance from Aaron Crespo.

After combining these data sources, author names are disambiguated. Using canonical names, philosophers of science are identified and genders are attributed to philosophers of science based on author names.

Currently, philosophers of science are identified as authors who (after name disambiguation) have 2 or more articles in an identified set of "primary philosophy of science" journals (including the three book series).

Files Overview