4teamwork / ftw.tika

This product integrates Apache Tika for full text indexing with Plone.
4 stars 1 forks source link

Remove local fallback when there was an error. #33

Closed jone closed 6 years ago

jone commented 6 years ago

When the tika server requires too much memory for extracting the text of a document, it could fail (maybe also because of a connection timeout). This did lead to a local fallback, causing an uncontrolled process to consume too much memory.

It is therefore better to not have automatic fallbacks.

maethu commented 6 years ago

@jone Do you got something in mind for local development? IMHO for my local dev installations the fallback is useful. But this is just my laziness, I mean I'm able to boot a local tika service 😄

jone commented 6 years ago

@lukasgraf @maethu @buchi I have updated the code: I'm now doing a fallback when we cannot connect at all (e.g. sever not running), and abort with empty-string when there was an error (connection timeout; bad statusline because Memmon killed the process; etc).

lukasgraf commented 6 years ago

Honestly, I would probably have written the logic the other way around:

try:
    # ...
except TimeoutError:
    log.error(...)
    return u''
except EverythingElse:
    fallback()

This way it's much easier to read that a special handling for timeouts happens, and there's less risk of skipping the fallback in cases other than timeouts where we don't want to skip it.

jone commented 6 years ago

I have updated the code: changed the log text; fixed exc error in first except.

Regarding switching the exceptions I don't agree:

So I'd like it either be this way or I need to make 100% sure that local conversion cannot happen.

The local conversions currently take down the VM completely.

jone commented 6 years ago

@lukasgraf As this was quite urgent I shipped it in the current state. We can discuss this again and change it if we agree on another solution.