JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.88k stars 711 forks source link

[SPARKNLP-1093] Adding support to read Email files #14455

Open danilojsl opened 1 week ago

danilojsl commented 1 week ago

Description

This pull request introduces a new feature that enables reading and parsing Email files into a structured Spark DataFrame. Leveraging this functionality allows for efficient processing and analysis of email content, seamlessly integrating with Spark NLP for enhanced downstream natural language processing tasks.

Key Changes

Added sparknlp.read().email() Method: This method accepts file paths or a file path to parse Email content into a Spark DataFrame. Support for Varied Sources: The method is designed to handle both local directories, distributed file systems containing email files.

Important: Please do not merge this PR until PR #14449 is merged.

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist: