MAIF / melusine

📧 Melusine: Use python to automatize your email processing workflow
https://maif.github.io/melusine
Other
352 stars 58 forks source link

Issue with attachment type metadata #115

Closed Maxime-POULAIN-Verlingue closed 2 years ago

Maxime-POULAIN-Verlingue commented 2 years ago

Hey !

We had a problem with the attachment type in metadata. As you can see in the screenshot below, we had only two values after applying our Metadata pipeline. 0 for the presence of an attachment file and 1 if there is no attachment file in the mail. The screenshot is an extract of the DataFrame call df_email.

dfemails_error_metadata

Here is the way we create our pipeline and how we apply it on our emails :

Metadatapipeline = Pipeline([('MetaExtension', MetaExtension()), ('MetaDate', MetaDate()), ('MetaAttachmentType', MetaAttachmentType()), ('Dummifier', Dummifier(columns_to_dummify = ['extension', 'attachment_type', 'dayofweek','hour', 'min']))]) df_meta = Metadatapipeline.fit_transform(df_emails)

Then, this is the function which is supposed to extract the type of the attachment file in melusine/prepare_email/metadata_engineering.py: image image

We added some prints to understand what is the problem. As you can see, when there is at least one attachment file in the mail, the type of x is str, and when there is no attachment file the value of x is nan.
When the function has to deal with a mail with an attachment file, the value of the row["attachment"] is a str. For example, we could have "['image002.png', 'image003.jpg']". Then, the for loop will just take it as a str and deal with the char one by one. This seems to be the reason of our issue.

To fix this problem, we did : image

This seems to solve our issue : image image

Python version : 3.8.12

Melusine version : 2.3.1

Operating System : Windows

TFA-MAIF commented 2 years ago

Hi Maxime,

We are currently working on refactoring Melusine and this might be too early to integrate. We keep your suggestion in mind but will put it on hold at the moment.

Best regards

Maxime-POULAIN-Verlingue commented 2 years ago

Hey !

I downloaded the last version of Melusine (2.3.4) and It seems I don't have this issue anymore with the new version. I close this issue.

Best regards