Add image content extraction

Problem Description

We currently only have Apache Tika as an extraction tool. This doesn't support image content extraction.

Proposed Solution

If we implement Tesseract OCR (or something similar) we can add image content extraction to the extraction service.

Alternatives

Other tools are also acceptable if we investigate first.

Additional Context

This should be extractable from the same endpoint that Tika is extracted from /extract_text, but would require adding a param for extraction type so we can differentiate between Tika and the image extractor. The response format should be identical.

elastic / data-extraction-service