18F / API-All-the-X

Resources and Materials for the /Developer Program
https://github.com/18F/API-All-the-X/tree/master/pages
Other
180 stars 129 forks source link

APIs in data.json files #186

Open gbinal opened 9 years ago

gbinal commented 9 years ago

Data.gov API Results

gbinal commented 9 years ago

I believe this should be the URLs for the above:

http://usda.gov/data.json
http://www.commerce.gov/data.json
http://nist.gov/data.json
http://data.noaa.gov/data.json
http://www.defense.gov/data.json
http://www2.ed.gov/data.json
http://www.energy.gov/data.json
http://nrel.gov/data.json
http://healthdata.gov/data.json
http://www.dhs.gov/sites/default/files/publications/digital-strategy/data.json
http://www.hud.gov/data.json
http://www.doi.gov/data.json
http://www.justice.gov/data.json
http://www.dol.gov/data.json
http://www.state.gov/data.json
http://www.dot.gov/data.json
http://treasury.gov/data.json
http://www.va.gov/data.json
http://www.usaid.gov/data.json
http://www.epa.gov/data.json
http://www.gsa.gov/data.json
http://www.nasa.gov/data.json
http://www.archives.gov/data.json
http://www.nrc.gov/data.json
http://www.nsf.gov/data.json
http://www.opm.gov/data.json
https://www.sba.gov/sites/default/files/data.json
http://www.ssa.gov/data.json
http://www.consumerfinance.gov/data.json
http://www.fhfa.gov/data.json
http://www.imls.gov/data.json
http://data.mcc.gov/raw/index.json
http://www.nitrd.gov/data.json
http://www.ntsb.gov/data.json
http://www.sec.gov/data.json
https://open.whitehouse.gov/data.json
harrisj commented 9 years ago

This is the broken way the NSF represents API records in their data.json

{
@type: "dcat:Dataset",
title: "NSF Award Search Web API",
accessLevel: "public",
contactPoint: {
@type: "vcard:Contact",
fn: "Nancy Kaplan",
hasEmail: "mailto:nkaplan@nsf.gov"
},
description: "The NSF Award Search web API provides a web API interface to the Research.gov's Research Spending and Results data, which provides NSF research award information from 2007.",
identifier: "1102",
keyword: [
"nasa",
"national aeronautics and space administration national aeronautics and space administration stem",
"national science foundation",
"nsf",
"research and education",
"science and engineering"
],
license: "http://www.nsf.gov/",
modified: "P1D",
publisher: {
@type: "org:Organization",
name: "National Science Foundation"
},
distribution: [
{
@type: "dcat:Distribution",
downloadURL: "http://www.research.gov/common/webapi/awardapisearch-v1.htm",
mediaType: "application/json"
}
],
bureauCode: [
"422:00"
],
programCode: [
"422:011"
]
}
harrisj commented 9 years ago

This is the nonstandard data.json that NREL uses

{
title: "PVWatts",
description: "PVWatts calculates the energy production and cost savings of grid-connected photovoltaic (PV) energy systems. This service estimates the performance of hypothetical residential and small commercial PV installations.",
keyword: "solar, photovoltaic, PV, calculator, payback",
modified: "2013-05-01",
publisher: "National Renewable Energy Laboratory",
person: "NREL Open Data",
mbox: "data@nrel.gov",
identifier: "392cf124-b37e-4d3c-b04c-29a2fd3cfabd",
accessLevel: "public",
webService: "http://developer.nrel.gov/api/pvwatts/v4.json",
landingPage: "http://developer.nrel.gov/doc/pvwatts",
references: "http://developer.nrel.gov/doc/api/pvwatts/v4",
spatial: "United States"
},
harrisj commented 9 years ago

Agencies with no API records in their data.json:

harrisj commented 9 years ago

This is how the MCC represents their data.json

{
publisher: "Millennium Challenge Corporation",
license: "data.mcc.gov terms of use - http://data.mcc.gov/termsofuse.html",
description: "MCC Open Data API",
language: "English",
title: "Open Data API",
issued: "5/1/13 0:00",
format: "json",
landingPage: "http://data.mcc.gov/developers",
modified: "5/1/13 0:00",
systemOfRecords: "Open Data Catalog",
person: "Open Data Initiative",
theme: "Open Data API",
keyword: "data, api",
identifier: "data-api",
dataDictionary: "http://data.mcc.gov/performance/projects.html",
accessLevel: "Public",
mbox: "opendata@mcc.gov",
webService: "http://data.mcc.gov/api"
},
harrisj commented 9 years ago

The following agencies are giving me errors when I attempt to crawl them

Timeouts

Malformed JSON

404 Not Found

harrisj commented 9 years ago

Here are the counts I have so far

http://www.usaid.gov/data.json
  + 5 APIs found
https://www.sba.gov/sites/default/files/data.json
  + 3 APIs found
http://www.consumerfinance.gov/data.json
  + 2 APIs found
http://www.archives.gov/data.json
  + 3 APIs found
http://www.dot.gov/data.json
  + 12 APIs found
http://www.ssa.gov/data.json
  + 3 APIs found
http://www.opm.gov/data.json
  + 5 APIs found
http://www.gsa.gov/data.json
  + 9 APIs found
http://treasury.gov/data.json
  + 5 APIs found
http://usda.gov/data.json
  + 62 APIs found
http://www.hud.gov/data.json
  + 34 APIs found
http://www.epa.gov/data.json
  + 430 APIs found
http://www.commerce.gov/data.json
  + 5 APIs found
http://www.energy.gov/data.json
  + 19 APIs found
http://www.dol.gov/data.json
  + 181 APIs found
http://healthdata.gov/data.json
  + 3 APIs found
http://www.va.gov/data.json
  + 2 APIs found
http://www.nasa.gov/data.json
  + 4 APIs found

Excessively high counts are an indication of something bad in the data. For instance, here are some APIs returned in the EPA's data.json

  + file:////r6gis1/share1/Facilities/FRP/R6_FRP_20110505.gdb
  + file:////r6gis1/share1/Facilities/NPL/2012/NPL_2012.gdb/NPLpy09182012
  + http://www.epa.gov/superfund/sites/npl/status.htm
  + file:////r6gis1/share1/Admin/OK/OK_EmergencyManagementDirectors.gdb/OEMDirectors_Table
  + file:////r6gis1/share1/admin/OK/OK_Corporation_Commision_Districts.gdb/OK_Corporation_Commision_Districts
  + file:////r6gis1/share1/Admin/OK/OK_OHS_Regional_Response_System.gdb
  + file:////r6gis1/share1/Admin/Parcels/Parcel_status_r6.shp
  + https://www.edg.epa.gov/data/public/R6/Brownfields/R6Brownfields_kmz.zip
  + https://edg.epa.gov/data/public/R6/Brownfields/R6Brownfields_062612.zip
  + http://edg.epa.gov/data/public/R6/Brownfields/R6Brownfields.zip
  + file:////r6gis1/share1/Census/Census2010/PL94171_2010.gdb/R6_PL2010_Block
  + file:////r6gis1/share1/Census/Census2010/PL94171_2010_SumByOtherGeog.gdb/R6_PL2010_BlockGroup
  + file:////r6gis1/share1/admin/NM/NM_OCD_Divisions.gdb/NM_Oil_Conservation_Divisions
  + file:////r6gis1/share1/Air/Nonattainment/Nonattainment_July2012.shp
  + file:////r6gis1/share1/Air/Nonattainment/Nonattainment_2012.gdb/Nonattainment_2012
  + file:////r6gis1/share1/Air/Nonattainment/Nonattainment_2013.gdb/Nonattainment_July2012
  + https://edg.epa.gov/data/public/r6/NPL/NPLpt05122014.zip
  + https://edg.epa.gov/data/public/r6/npl/NPLpy05122014.zip
  + https://edg.epa.gov/data/Public/R6/Aquifers/R6SSAquifers.zip
  + file:////r6gis1/share1/Facilities/TRI/tri2011/r6tri2011.gdb/R6TRI2011
  + file:////r6gis1/share1/Border/Mexico/Mexico_Data.gdb/rail
  + https://edg.epa.gov/data/Public/R6/TEAP/TEAP_Data.zip
  + file:////r6gis1/share1/Facilities/RCRA/RCRA_Sites_Mar_2012.lyr

There is likely similar fragmentation in some other data.json examples from other agencies.

gbinal commented 9 years ago

More data files here: