elastic / data-extraction-service

Other
12 stars 0 forks source link

Include pdf metadata (like 'title') #24

Open seanstory opened 10 months ago

seanstory commented 10 months ago

Problem Description

The Ingest Attachment Processor can extract metadata as well as text. While we don't use it in our default pipelines, customers can choose to customize a pipeline to capture this metadata. However, the extraction service doesn't expose any of it.

Proposed Solution

Expose (common) metadata through generic and optional output fields.

Alternatives

???

Additional Context

See https://github.com/elastic/enterprise-search-team/issues/2744