Description of Problem:
The current Melusine Tokenizer is frequently called implicitely and users do not have control over it.
The user should be able to specify which tokenizer should be used by a NeuralModel.
Examples:
tokenizer = MelusineTokenizer(tokenizer_regex, stopwords, flags)
tokens = tokenizer.tokenize("Hello John how are you")
tokenizer.save("tokenizer.json")
tokenizer_reloaded = MelusineTokenizer.load("tokenizer.json")
model = NeuralModel(..., tokenizer=tokenizer)
Definition of Done:
The new tokenizer class works fine.
Users can specify which tokenizer they want to use in their NeuralModel.
Description of Problem: The current Melusine Tokenizer is frequently called implicitely and users do not have control over it. The user should be able to specify which tokenizer should be used by a NeuralModel.
Examples:
Definition of Done: The new tokenizer class works fine. Users can specify which tokenizer they want to use in their NeuralModel.