Include Tika in aws connector so the SUPPORTED_FILETYPE can included csv, json and xml files

Problem Description

The new AWS connector connects to S3 - people place standard data file types here,i.e., log.json, table.csv, and old.xml files. Our current support types are for programing language files to be read. This isn't the normal place to keep your python, ruby, and shell scripts.

Proposed Solution

Since Tika is used throughout enterprise search to handle multiple file types, we should use it within AWS connector so we can expose the file types here as well.

Alternatives

The alternative is to pull in binary representation and use ingest pipeline (using tika) to perform the extraction.

Additional Context

This is an awesome connector + the directory one ++

Someone asked about arvo files stored in s3 - So I assume the list is endless of supported files. For something like avro - we should build a ingest pipeline to handle these non-tika types, I would assume.

elastic / connectors