awslabs / project-lakechain

:zap: Cloud-native, AI-powered, document processing pipelines on AWS.
https://awslabs.github.io/project-lakechain/
Apache License 2.0
79 stars 16 forks source link

Feature request: Implement an E-mail text processor #10

Closed HQarroum closed 5 months ago

HQarroum commented 5 months ago

Use case

Parse emails at scale, comprising .eml and .msg documents.

Solution/User Experience

Narrative

The e-mail text processor makes it easy to extract the textual content of e-mail documents and pipe it to other middlewares for further processing. This middleware can extract text, HTML, and structured JSON from e-mail documents.

Alternative solutions

No response