Open SG87 opened 5 years ago
We have an update planned to address more advanced tokenization/data processing, but currently there's not an easy way. It's easy to load the embedding weights into the model, but it's a bit difficult to change the preprocessing to handle tokenization that's not ascii-256 character level.
Is it possible to start model training (main.py) from existing word embeddings like Fasttext?