MAIF / melusine

📧 Melusine: Use python to automatize your email processing workflow
https://maif.github.io/melusine
Other
352 stars 58 forks source link

use flashtext as a replacement for regex #47

Closed remiadon closed 4 years ago

remiadon commented 4 years ago

FlashText :

I see melusine uses a lot of regex for preprocessing/cleaning I wonder if this would be useful to melusine

nicolaspte commented 4 years ago

Hi Remi ! Thanks for the tip !

It is indeed a lot quicker than usual regex for lists of words having more than 500 items.

Thus, we decided to implement it for the name flagging as it is done by looking in a .csv file which has several thousands of items, resulting in a 20x times faster computation ! This update will be included in the next version.

Other regex uses smaller list of keywords (~100) , so it is not relevant to use Flashtext for now.