cmp-nct / ggllm.cpp

Falcon LLM ggml framework with CPU and GPU support
Other
244 stars 21 forks source link

Tokenenizer fix 1 #35

Closed cmp-nct closed 1 year ago

cmp-nct commented 1 year ago

Correct handling of unsupported characters (european and asian languages should work now) Added input conversion from UTF 16 to UTF 8 for Windows Added a workaround for the tokenizer to merge triplet tokens instead that can not form vocabularized duplets