Open rlouf opened 5 hours ago
The clean solution would be to use the tokenizers
crate to remove the dependency on transformers
in the Python package. In the meantime, it is unreasonable to ask downstream libraries to implement their own version of adapt_tokenizer
since this is always required to use the package.
After https://github.com/dottxt-ai/outlines-core/pull/52,
outlines-core
no longer has tokenizer support, aside from the two copies ofTransformerTokenizer
in the test and benchmark code. What's the plan wrt. this?If the plan is to use
adapt_tokenizer
to patchtransformers
tokenizers, it's not clear how that's an improvement over a custom tokenizer wrapper classes and a conditionaltransformers
dependency, for example. In general, we could moveTransformerTokenizer
back tooutlines-core
and maketransformers
optional, thenoutlines-core
will be usable with llama-based tokenizers and we won't need two copies for testing.Originally posted by @brandonwillard in https://github.com/dottxt-ai/outlines-core/issues/2#issuecomment-2403490462