The new AWS connector connects to S3 - people place standard data file types here,i.e., log.json, table.csv, and old.xml files.
Our current support types are for programing language files to be read. This isn't the normal place to keep your python, ruby, and shell scripts.
Proposed Solution
Since Tika is used throughout enterprise search to handle multiple file types, we should use it within AWS connector so we can expose the file types here as well.
Alternatives
The alternative is to pull in binary representation and use ingest pipeline (using tika) to perform the extraction.
Additional Context
This is an awesome connector + the directory one ++
Someone asked about arvo files stored in s3 - So I assume the list is endless of supported files.
For something like avro - we should build a ingest pipeline to handle these non-tika types, I would assume.
Problem Description
The new AWS connector connects to S3 - people place standard data file types here,i.e., log.json, table.csv, and old.xml files. Our current support types are for programing language files to be read. This isn't the normal place to keep your python, ruby, and shell scripts.
Proposed Solution
Since Tika is used throughout enterprise search to handle multiple file types, we should use it within AWS connector so we can expose the file types here as well.
Alternatives
The alternative is to pull in binary representation and use ingest pipeline (using tika) to perform the extraction.
Additional Context
This is an awesome connector + the directory one ++
Someone asked about arvo files stored in s3 - So I assume the list is endless of supported files. For something like avro - we should build a ingest pipeline to handle these non-tika types, I would assume.