alephdata / ingest-file

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
GNU Affero General Public License v3.0
54 stars 25 forks source link

Configurable or reduced timeout in support/convert.py #573

Closed adryd325 closed 6 months ago

adryd325 commented 7 months ago

We're getting files that are stuck converting for the entire hour and killed by the timeout. It would be nice if the timeout was configurable to something lower, say 120 seconds instead of an hour.

stchris commented 7 months ago

Hi @adryd325! The timeout for the document conversion is configurable along with the number of retries: https://github.com/alephdata/ingest-file/blob/8f5fedeb6a7d0eaf3d41d10dd2bc42a31afa72b5/ingestors/settings.py#L7 Is this what you were looking for?

adryd325 commented 7 months ago

I'll have to give it a try; the variable i was thinking of was

https://github.com/alephdata/ingest-file/blob/8f5fedeb6a7d0eaf3d41d10dd2bc42a31afa72b5/ingestors/support/convert.py#L15C26-L15C26

i found that when i saw libreoffice pdf conversions hanging, and looked for where the processes were spawned

adryd325 commented 7 months ago

Doing a quick search in the repo, I'm not able to find any usages of "CONVERT_TIMEOUT"