Closed Maxime-POULAIN-Verlingue closed 2 years ago
Hi Maxime,
We are currently working on refactoring Melusine and this might be too early to integrate. We keep your suggestion in mind but will put it on hold at the moment.
Best regards
Hey !
I downloaded the last version of Melusine (2.3.4) and It seems I don't have this issue anymore with the new version. I close this issue.
Best regards
Hey !
We had a problem with the attachment type in metadata. As you can see in the screenshot below, we had only two values after applying our Metadata pipeline. 0 for the presence of an attachment file and 1 if there is no attachment file in the mail. The screenshot is an extract of the DataFrame call df_email.
Here is the way we create our pipeline and how we apply it on our emails :
Then, this is the function which is supposed to extract the type of the attachment file in melusine/prepare_email/metadata_engineering.py:
We added some prints to understand what is the problem. As you can see, when there is at least one attachment file in the mail, the type of x is str, and when there is no attachment file the value of x is nan.
When the function has to deal with a mail with an attachment file, the value of the row["attachment"] is a str. For example, we could have "['image002.png', 'image003.jpg']". Then, the for loop will just take it as a str and deal with the char one by one. This seems to be the reason of our issue.
To fix this problem, we did :
This seems to solve our issue :
Python version : 3.8.12
Melusine version : 2.3.1
Operating System : Windows