Open mcchung52 opened 6 months ago
for now, getting this error: raise TokenizerNotAvailableError("Failed to locate a suitable tokenizer implementation for '{}' (Make sure your current environment provides a tokenizer backend like 'transformers', 'tiktoken' or 'llama.cpp' for this model)".format(model_identifier)) lmql.runtime.tokenizer.TokenizerNotAvailableError: Failed to locate a suitable tokenizer implementation for 'huggyllama/llama-7b' (Make sure your current environment provides a tokenizer backend like 'transformers', 'tiktoken' or 'llama.cpp' for this model)
Hello @mcchung52 I had similar issues. Please check this issue: #350
Hi, I had the same error when trying to run a model using llama-cpp loader. It is not really clear when reading the documentation but to run model using llama-cpp you have to install the package lmql[hf] (instead of lmql) and llama-cpp-python to provide the inference backend.
Does it help ?
LMQL looks very promising (having played w/ Guidance) so I want to make this work but having issues from get go, trying to run it locally. I'm really hoping I can get some help.
IMMEDIATE GOAL: what is the simplest way to make this work?
Context: I have several gguf models in my comp that I want to run in my Macbook pro (pre-M, intel), basically via CPU which I ran previously many times in python code, though slow.
I want to: 1.run model directly in python code 2a.run model by exposing via api, like localhost:8081 2b.(can't in my mac but can in pc) run gguf via LM Studio and expose ip:port in PC and have python code in mac tap into it
Code:
Thanks in advance.