Upon upload, the DocumentCloud API response does not include the values for file_hash or pages, probably because those get calculated during the processing of the document and are not available when the file is dropped off.
I'd like to add a function in db.py to walk through the database of uploaded files and retrieve those values for each doc. It should include multiprocessing on supported platforms.
Upon upload, the DocumentCloud API response does not include the values for
file_hash
orpages
, probably because those get calculated during the processing of the document and are not available when the file is dropped off.I'd like to add a function in
db.py
to walk through the database of uploaded files and retrieve those values for each doc. It should include multiprocessing on supported platforms.