KartikTalwar / gmail.js

Gmail JavaScript API
MIT License
3.74k stars 456 forks source link

How To Extract Topmost Part of Email with Multiple Levels of Quotes using gmail.js? #760

Closed jt-wang closed 1 year ago

jt-wang commented 1 year ago

Hi there,

I am encountering a difficulty whilst trying to extract only the most recent segment of a layered email that includes multiple replies and quotes, and your insight would be greatly appreciated.

Here is the structure of a sample email with numerous back-and-forth interactions and varying layers of quotes:

Lorem ipsum [Recipient's First Name],

Lorem ipsum dolor sit amet!

Best regards,
[Sender's FullName]

On [Day], [Month] [Date], [Year] at [Time] [Recipient's Full Name] <[Recipient's Email Address]> wrote:

> Lorem ipsum dolor sit, [Sender's First Name];
> Lorem ipsum dolor sit amet!
> .
> .
> From: [Sender's FullName] <[Sender's Email Address]>
> Date: [Day], [Month] [Date], [Year] at [Time]
> To: [Recipient's Full Name] <[Recipient's Email Address]>
> Subject: Lorem Ipsum: [Subject]
>
> Lorem ipsum dolor sit, [Recipient's First Name],
>
> Best regards,
>
> [Sender's FullName]
>
> On [Day], [Month] [Date], [Year] at [Time] [Sender's FullName] <[Sender's Email Address]>
>
> > Lorem ipsum dolor sit amet,
> >
> > Best regards,
...

From this complex email thread, my primary goal is to extract only:

Lorem ipsum [Recipient's First Name],

Lorem ipsum dolor sit amet!

Best regards,
[Sender's FullName]

Presently, I am using

gmail.new.get.email_data(domEmail).content_html

to fetch the entire email content, firstly removing all HTML tags. Nevertheless, I am unable to isolate the topmost segment of the email content that excludes any ‘quoted replies’. I was thinking about just using regex, but soon realized it's not feasible because different email clients have their different ways of "quoting" history, with also possibly different intentions, or even in different languages when quoting.

Could you please guide me on how to selectively get this part of the email using the gmail.js API, or suggest any other recommended approach for this task?

Thank you for your time and assistance in this matter!

josteink commented 1 year ago

That sounds like a general text-processing issue and not something specific to Gmail.js.

It’s probably also complex enough in itself to warrant a NPM package.

I suggest asking on stack overflow, CharGPT or other resources better suited for such questions. Good luck! 🙂

jt-wang commented 1 year ago

Thanks Jostein for the quick response! Yeah it's true that it pertains to text-processing, but I thought Gmail.js would be able to resemble what Gmail front-end itself has been doing - to differentiate the most recent email body from all history image.

josteink commented 1 year ago

We don’t have an API for that.

If you find the proper CSS-selectors to do that, feel free to make a PR 😊

jt-wang commented 1 year ago

Thanks! I'll close the issue for now.