4teamwork / ftw.solr

Solr integration for Plone
5 stars 5 forks source link

Solr 7.3 compatibility: Fix requests to ExtractingRequestHandler #118

Closed lukasgraf closed 6 years ago

lukasgraf commented 6 years ago

Set theapplication/x-www-form-urlencoded Content-Type for requests to the /update/extract endpoint to ensure compatibility with Solr 7.3.

This is needed because something suble changed from Solr 7.2 to 7.3 where the /update/extract endpoint now requires you to either set the application/x-www-form-urlencoded content type or use a GET instead of a POST in order for it to honor the stream.file parameter.

With just a POST and no urlencoded content type, 7.3 will attempt to read the content stream from the POST body (ignoring stream.file), resulting in an empty stream being passed to Tika for extraction. This will then cause Tika to fail with a ZeroByteFileException.

I tested this invocation of the ExtractingRequestHandler for both Solr 7.2 and 7.3, so it works for both.


Alternatively, just switching from POST to GET also does the trick (and also works for both 7.2 and 7.3). The documentation on Content Stream Sources isn't entirely clear on whether GET or POST should be used if stream.file is the intended stream source, so I went with setting the application/x-www-form-urlencoded header because the documentation is slightly less ambiguous on that.