18F / 2015-foia-hub

A consolidated FOIA request hub.
48 stars 17 forks source link

Running the document processing on documents in an S3 bucket #680

Open khandelwal opened 9 years ago

khandelwal commented 9 years ago

We probably need a script or some code that runs the document processing toolkit over documents in an S3 bucket (which is where they will end up after we've collected them) that haven't been processed before.

Processing in this case mostly means extracting the text.

geramirez commented 9 years ago

We now have an s3 bucket: 18f-foia-doc-storage with 2000+ docs on keystone

geramirez commented 9 years ago

Upgrades here https://github.com/18F/doc_processing_toolkit/pulls