elastic / data-extraction-service

Other
12 stars 0 forks source link

Add exceptions for tika and missing EOF lines #11

Closed navarone-feekery closed 1 year ago

navarone-feekery commented 1 year ago

Related to https://github.com/elastic/enterprise-search-team/issues/4576

Adds some simple exception passing for tika-server proxy.

Also adds missing EOF lines.

Notes on response status (and slight ramble):

Because of the nature of proxy passing, it's not possible to update the nginx status for multipart requests inside the filter_body_by_lua block, so these exceptions will still return as 200s. This is because nginx returns the header response before the body response, and we don't have access to the contents of the full response body before the headers have already been sent back. We don't know if an error has occurred until we have all of the contents, so unless an internal server error happens inside tika, the response is always 200 here.

This could be bypassed by not doing a proxy pass and instead manually requesting tika with a rewrite_by_lua block. But that is a lot of work and this already works, maybe something to think of for a v2.

Checklists

Pre-Review Checklist