MAIF / melusine

📧 Melusine: Use python to automatize your email processing workflow
https://maif.github.io/melusine
Other
352 stars 58 forks source link

Add phraser_on_clean_text #116

Closed Maxime-POULAIN-Verlingue closed 2 years ago

Maxime-POULAIN-Verlingue commented 2 years ago

Hello !

Description of Problem: We wanted to concatenate the clean_header and the clean_body to create a column clean_text. As you can see on the screenshot below, we did it after applying the Transformer Pipeline. Then if we follow the examples in the tutorials, we want to use a NLP Pipeline, with a phraser function and a tokenizer. Our problem : we can apply the tokenizer on the column we want (clean_text), but we can't apply the phraser on this column. Indeed, there are only two phraser function in melusine (phraser_on_body and phraser_on_header) which apply the phraser on the columns clean_body and clean_header. We can't concatenate clean_body and clean_header after applying their phraser function, because they are in the same pipeline that the tokeniser function which can be applied on clean_text. Pb_phraser

Overview of the Solution: One possible solution is the creation of a phraser function for the clean_text in "melusine/nlp_tools/phraser.py". It would allow us to apply the phraser on a column name clean_text.

image

Result: SDolution_pb_phraser

TFA-MAIF commented 2 years ago

Hi Maxime,

We do have a similar modification coming on the way. This should be released in next version of Melusine.

Best regards