18F / 2015-foia-hub

A consolidated FOIA request hub.
Other
49 stars 17 forks source link

KRANG: Get a handful of sample FOIA requests to better understand natural language wording and possible search terms #451

Closed jtag closed 9 years ago

jtag commented 9 years ago

@vz3 Where these DOJ samples or just general?

Also wondering (now) if we could better understand by just looking through Muckrock or other public FOIA request site? https://www.muckrock.com/foi/list/

geramirez commented 9 years ago

@jtag If you don't have these yet I can look for a couple.

jtag commented 9 years ago

That would be great!

Curious what terms people rely on. If any patterns or priorities emerge.

Jesse

On Monday, February 2, 2015, Gabriel Ramirez notifications@github.com wrote:

@jtag https://github.com/jtag If you don't have these yet I can look for a couple.

— Reply to this email directly or view it on GitHub https://github.com/18F/foia-hub/issues/451#issuecomment-72481772.

Jesse Taggert Product Strategy, User Experience & Design 18F.gsa.gov

khandelwal commented 9 years ago

What do specifically do you want to get out of this? To do this right, and ensure we represent a lot of the different requests, we'd likely need to analyze more than a handful.

jtag commented 9 years ago

I was considering this a low fi exploratory spike of this method.

On Monday, February 2, 2015, Shashank Khandelwal notifications@github.com wrote:

What do specifically do you want to get out of this? To do this right, and ensure we represent a lot of the different requests, we'd likely need to analyze more than a handful.

— Reply to this email directly or view it on GitHub https://github.com/18F/foia-hub/issues/451#issuecomment-72490415.

Jesse Taggert Product Strategy, User Experience & Design 18F.gsa.gov

geramirez commented 9 years ago

I just downloaded around 3000 from foiaonline will that work? Requests and Agencies along with list of words used over 300 times.

Also some EDA here

khandelwal commented 9 years ago

What did we learn from the requests about possible search terms and natural language wording? Is there a summary you can put in here?

geramirez commented 9 years ago
Summary

First of all I'd like to point out that the sample I analyzed is biased, because it includes only requests that foiaonline decided to release. There are many more foia requests, which foiaonline does not release to the public and we should consider asking to analyze these texts as well. Nevertheless, the takeaways are:

  1. Requests tend to be short. The 75% of the requests were under 150 words and 50% were under 65 words.
  2. Requests from professional requesters (requesters who made more than 15 requests) appeared to be longer than requesters who had made fewer requests.
  3. Request tend to use the same words/structure constantly. Words like please, request, records, would, documents etc. appear in many the of the requests. At some point in the future we should analyze the effect of these words and others on processing times and responses.
  4. A quick run of LDA confirmed using requests can be useful for extracting topics because they appear to have underlying trends. For example, some clusters of documents clearly contained keywords photos, fingerprint, and detained others were about site, property, landfill. As we move forwards we should develop ways of analyzing requests to improve the service.

Discussion of analysis of request has moved here to issue #469

vz3 commented 9 years ago

I emailed 10 DOJ requests to the group