Set theapplication/x-www-form-urlencoded Content-Type for requests to the /update/extract endpoint to ensure compatibility with Solr 7.3.
This is needed because something suble changed from Solr 7.2 to 7.3 where the /update/extract endpoint now requires you to either set the application/x-www-form-urlencoded content type or use a GET instead of a POST in order for it to honor the stream.file parameter.
With just a POST and no urlencoded content type, 7.3 will attempt to read the content stream from the POST body (ignoring stream.file), resulting in an empty stream being passed to Tika for extraction. This will then cause Tika to fail with a ZeroByteFileException.
I tested this invocation of the ExtractingRequestHandler for both Solr 7.2 and 7.3, so it works for both.
Alternatively, just switching from POST to GET also does the trick (and also works for both 7.2 and 7.3). The documentation on Content Stream Sources isn't entirely clear on whether GET or POST should be used if stream.file is the intended stream source, so I went with setting the application/x-www-form-urlencoded header because the documentation is slightly less ambiguous on that.
Set the
application/x-www-form-urlencoded
Content-Type for requests to the/update/extract
endpoint to ensure compatibility with Solr 7.3.This is needed because something suble changed from Solr 7.2 to 7.3 where the
/update/extract
endpoint now requires you to either set theapplication/x-www-form-urlencoded
content type or use aGET
instead of aPOST
in order for it to honor thestream.file
parameter.With just a
POST
and no urlencoded content type, 7.3 will attempt to read the content stream from the POST body (ignoringstream.file
), resulting in an empty stream being passed to Tika for extraction. This will then cause Tika to fail with aZeroByteFileException
.I tested this invocation of the ExtractingRequestHandler for both Solr 7.2 and 7.3, so it works for both.
Alternatively, just switching from POST to GET also does the trick (and also works for both 7.2 and 7.3). The documentation on Content Stream Sources isn't entirely clear on whether GET or POST should be used if
stream.file
is the intended stream source, so I went with setting theapplication/x-www-form-urlencoded
header because the documentation is slightly less ambiguous on that.