az0 / entity-metadata

Basic metadata on entities such as people, corporations, schools, and churches
GNU General Public License v3.0
7 stars 2 forks source link
dataset natural-language-processing

Entity Metadata

This repository contains code and data about people and organizations. Potential uses include training and evaluation data sets to:

Data sets

Entity Source Download
Physician CMS CSV
Author Open Library CSV
Academic author Open Academic Graph CSV
Person Wikidata CSV
Person: Nicknames onyxrev CSV
Voter Florida Voter Registration CSV
Voter North Carolina Voter Registration CSV
Church Wikidata via SPARQL CSV
Licensee US States CSV
Inmate Florida CSV
Deceased Veterans Affairs CSV
Public school California Department of Education CSV
College US Department of Education CSV
Radio and TV station Wikidata CSV

Using PetScan

PetScan is a simple way to get a list of articles in a category from Wikipedia. For more advanced use, SPARQL might be better.

This is an example of how to export a list of articles in a category from PetScan. The CSV includes the Wikidata IDs, which can be fed to the script wikidata_org.py here to look up their metadata.

  1. Go to PetScan
  2. Set categories to Churches in the United States
  3. Click the Wikidata tab
  4. Click the Add items, where available option
  5. Click the Output tab
  6. Click the CSV option
  7. Click the Do It button

Large files

The original and processed data sets can be very large, so most data sets are not committed to this repository. Please use either

  1. Download the original and use the programs here to process them, or
  2. Download the processed files from the SourceForge repository.

License

Written by Andrew Ziem. Copyright (c) 2017-2020 Compassion International.

The code is licensed under the GNU General Public License version 3, and the data sets belong to the original data owners. Please consult the original sources for data licenses.

Other resources

Roster of professional licenses

Lists of names of businesses from government sources

Contact information

While this repository contains names of entities and some other metadata, this repository does not contain any contact information: mailing address, email address, telephone number, etc.

Search keywords