allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 132 forks source link

Different Character Encodings #272

Open nelson-liu opened 7 years ago

nelson-liu commented 7 years ago

Using byte encoding on unicode characters could be a good idea, vs a single index for each unicode characters.

Allowing for different character encodings in tokenizers that return characters would thus be nice.