Different Character Encodings

allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)

Apache License 2.0

404 stars 132 forks source link

Open nelson-liu opened 7 years ago

nelson-liu commented 7 years ago

Using byte encoding on unicode characters could be a good idea, vs a single index for each unicode characters.

Allowing for different character encodings in tokenizers that return characters would thus be nice.