Closed navarone-feekery closed 1 year ago
@artem-shelkovnikov yes that's correct, the rmeta
endpoints don't support streaming so I'm using a different endpoint that does. It returns the same extracted content, just the formatting of the response is different. (rmeta
also returned a lot of auxilliary information like file type and size, but we aren't using that information so it's okay to lose it).
Related to https://github.com/elastic/enterprise-search-team/issues/5048
The current iteration of content extraction requires the sender to send a file in multipart. This is an issue for large files because without chunking the data, it will need to be loaded into memory.
These changes alter the proxy endpoints to pass to
/tika/text
instead ofrmeta/*
. Unfortunately,rmeta/*
endpoints require multipart requests. Here is a/tika/text
example from docs for reference.The following things are impacted by this:
/var/log/tikaserver.log
so someone can check that if they want more informationChecklists
Pre-Review Checklist
v7.13.2
,v7.14.0
,v8.0.0
)Related Pull Requests
https://github.com/elastic/connectors-python/pull/1158