Open cgisky1980 opened 1 year ago
python & pytorch is too big for the end users.
I agree, but pure C++ implementation is blocked by implementing tokenizer, specifically -- Unicode normalization and Unicode regexes in 20B_tokenizer
. Although RWKV World models are easier to support, since their tokenizer does not require Unicode libraries.
I myself have no plans of implementing the tokenizer in C, but I welcome PRs.
python & pytorch is too big for the end users.
I agree, but pure C++ implementation is blocked by implementing tokenizer, specifically -- Unicode normalization and Unicode regexes in
20B_tokenizer
. Although RWKV World models are easier to support, since their tokenizer does not require Unicode libraries.I myself have no plans of implementing the tokenizer in C, but I welcome PRs.
@cgisky1980 Unfortunately, tokenizers-cpp
will not work here:
You also need to turn on c++17 support
Can we have an example of pure C ++ ? python & pytorch is too big for the end users.