DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
10.52k stars 509 forks source link

Standardized Access to Common Email and Calendar Formats #327

Open ByteMeFree opened 1 week ago

ByteMeFree commented 1 week ago

Requested feature The feature I propose is to make common email formats (such as .msg, .eml, and calendar files like .ics) readily available for users. This feature addresses the need for users to easily access and manage their email and meeting data in a standardized format. It should clearly display essential information, including the sender, recipient, CC, BCC, date, body, signature, attachments, and mail history for emails. For calendar files, it should include similar details such as meeting participants, date, time, agenda, and any attachments related to the meeting.

Alternatives I have considered the following Manual Export: Users can manually export emails and calendar events, but this process is time-consuming and prone to errors.

PeterStaar-IBM commented 1 week ago

@ByteMeFree This would entail to add dedicated backends in the library. Do you have any good libraries for parsing these formats?

ByteMeFree commented 1 week ago

import extract_msg msg = extract_msg.Message(file_path) https://github.com/TeamMsgExtractor/msg-extractor?tab=readme-ov-file

from ics import Calendar https://github.com/ics-py/ics-py

PeterStaar-IBM commented 1 week ago

@ByteMeFree We do no accept any libraries with viral licensing (eg GPL), in order to maintain our MIT Licence.

ByteMeFree commented 1 week ago

Ah, I see... Makes sense.

ByteMeFree commented 1 week ago

I think this should be possible, right? from ics import Calendar https://github.com/ics-py/ics-py