MAIF / melusine

📧 Melusine: Use python to automatize your email processing workflow
https://maif.github.io/melusine
Other
352 stars 58 forks source link

Parameters #119

Closed hugo-quantmetry closed 2 years ago

hugo-quantmetry commented 2 years ago

Description of Problem: The current Melusine Tokenizer is frequently called implicitely and users do not have control over it. The user should be able to specify which tokenizer should be used by a NeuralModel.

Examples:

tokenizer = MelusineTokenizer(tokenizer_regex, stopwords, flags)
tokens = tokenizer.tokenize("Hello John how are you")
tokenizer.save("tokenizer.json")
tokenizer_reloaded = MelusineTokenizer.load("tokenizer.json")

model = NeuralModel(..., tokenizer=tokenizer)

Definition of Done: The new tokenizer class works fine. Users can specify which tokenizer they want to use in their NeuralModel.