18F / 2015-foia-hub

A consolidated FOIA request hub.
Other
48 stars 17 forks source link

Explore using foiaonline documents for Krang #656

Closed geramirez closed 9 years ago

geramirez commented 9 years ago

foiaonline data appears to have a couple attributes which will make easy to ingest into Krang.

Detailed metadata

{
  "agency": "DOC",
  "author": null,
  "download_url": "https://foiaonline.regulations.gov/foia/action/getContent?objectId=pYyV_RXjeHl_90JW9libqXfxlcBvE4KS",
  "exemptions": null,
  "file_size": "1.4623193740844727",
  "file_type": "pdf",
  "landing_id": "090004d2801e59c4",
  "landing_url": "https://foiaonline.regulations.gov/foia/action/public/view/record?objectId=090004d2801e59c4",
  "released_on": "2014-03-19",
  "released_original": "Wed Mar 19 15:05:50 EDT 2014",
  "request_id": "DOC-NOAA-2014-000426",
  "retention": "6 year",
  "title": "001 20140116 Email message from MStoecker re Searsville Interim Measures Recomendations",
  "type": "record",
  "unreleased": false,
  "year": "2014"
}

Already hosted online

We may be able to direct people foiaonline for viewing and downloading "download_url": "https://foiaonline.regulations.gov/foia/action/getContent?objectId=pYyV_RXjeHl_90JW9libqXfxlcBvE4KS"

Krang's ES search is more robust and provide snippets.

Below are searches for mosquito in both Krang and foiaonline. In this iteration of Krang only contains 1000 documents and it's can already find 55 documents with the word mosquito in order relevance or date.

screen shot 2015-03-23 at 3 44 27 pm

screen shot 2015-03-23 at 3 43 31 pm

Converting Documents

Although the majority documents in sample foiaonline dataset I've downloaded. There are also additional file types .mystery, .zip, excel, word, images. Documents that have not been OCRed are taking about 99% of the conversion time. We might need to figure out a better way to handle these documents.

Bottom line

foiaonline documents will not be difficult to use in Krang because they are well organized and mostly OCRed. Using Krang immediately uncovered topics, which were difficult to find using foiaonline's search.

rjmajma commented 9 years ago

That's awesome, @ramirezg.