anthonydb / pneumatic

pneumatic is a bulk-upload library for DocumentCloud.
MIT License
23 stars 3 forks source link

Add way to add each document's `file_hash` and `pages` value to database #11

Open anthonydb opened 8 years ago

anthonydb commented 8 years ago

Upon upload, the DocumentCloud API response does not include the values for file_hash or pages, probably because those get calculated during the processing of the document and are not available when the file is dropped off.

I'd like to add a function in db.py to walk through the database of uploaded files and retrieve those values for each doc. It should include multiprocessing on supported platforms.

anthonydb commented 8 years ago

Closed via https://github.com/anthonydb/pneumatic/commit/d71a56a098865d4fb0e50bb36c6b78a1cafebf4c

anthonydb commented 8 years ago

Reopening to remind myself that the update_processed_files method needs to write something to the database indicating a file is not found.