Closed dnhkng closed 9 months ago
Could you please include your version info in the version section? There was a change recently which may have fixed this
python -c "from outlines import _version; print(_version.version)"
python -c "import sys; print('Python', sys.version)"
pip freeze
There's a good chance upgrading to latest (unreleased) 0.0.25
would fix this
pip install outlines git+https://github.com/outlines-dev/outlines
I was on 0.0.24 I can confirm that 0.0.25 fixes the issue with "outlines.generate.choice"...
But now "outlines.generate.format" throws the same kind of error!
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], [line 3](vscode-notebook-cell:?execution_count=3&line=3)
[1](vscode-notebook-cell:?execution_count=3&line=1) prompt = "sqrt(2)="
----> [3](vscode-notebook-cell:?execution_count=3&line=3) generator = outlines.generate.format(model, float)
[4](vscode-notebook-cell:?execution_count=3&line=4) answer = generator(prompt)
File [~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:396](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:396), in format(model, python_type, max_tokens, sampler)
[392](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:392) def format(
[393](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:393) model, python_type, max_tokens: Optional[int] = None, sampler: Sampler = multinomial
[394](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:394) ):
[395](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:395) regex_str = python_types_to_regex(python_type)
--> [396](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:396) return regex(model, regex_str, max_tokens, sampler)
File [~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:370](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:370), in regex(model, regex_str, max_tokens, sampler)
[364](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:364) def regex(
[365](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:365) model,
[366](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:366) regex_str: str,
[367](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:367) max_tokens: Optional[int] = None,
[368](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:368) sampler: Sampler = multinomial,
[369](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:369) ):
--> [370](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:370) fsm = RegexFSM(regex_str, model.tokenizer)
[372](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:372) device = model.device
[373](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/generate/api.py:373) generator = SequenceGenerator(fsm, model, sampler, device, max_tokens=max_tokens)
File [~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:120](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:120), in RegexFSM.__init__(self, regex_string, tokenizer)
[114](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:114) raise ValueError(
[115](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:115) "The vocabulary does not allow us to build a sequence that matches the input regex"
[116](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:116) )
[118](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:118) return states_to_token_maps, empty_token_ids
--> [120](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:120) self.states_to_token_maps, self.empty_token_ids = create_states_mapping(
[121](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:121) regex_string, tuple(sorted(tokenizer.vocabulary.items()))
[122](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:122) )
[123](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:123) self.vocabulary = tokenizer.vocabulary.values()
[124](https://file+.vscode-resource.vscode-cdn.net/home/dnhkng/Documents/LLM/Frankenmerge/~/miniforge3/envs/frankenmerge/lib/python3.10/site-packages/outlines/fsm/fsm.py:124) self.eos_token_id = tokenizer.eos_token_id
ValueError: too many values to unpack (expected 2)
This is very weird, as I see that "create_states_mapping" should return only two objects: states_to_token_maps and empty_token_ids. But when I print what is returned, I see its 3 objects:
({0: {59: 3, 52: 3, 29945: 3, 54: 3, 29947: 3, 56: 3, 60: 3, 29896: 3, 58: 3, 29929: 3, 29953: 3, 48: 1, 46: 1, 29974: 1, 29899: 1, 29955: 3, 55: 3, 53: 3, 29946: 3, 29906: 3, 51: 2, 57: 3, 29941: 3, 29900: 2}, 1: {59: 3, 52: 3, 29945: 3, 54: 3, 29947: 3, 56: 3, 60: 3, 29896: 3, 58: 3, 29929: 3, 29953: 3, 29955: 3, 55: 3, 53: 3, 29946: 3, 29906: 3, 51: 2, 57: 3, 29941: 3, 29900: 2}, 2: {29872: 5, 72: 5, 29889: 4, 2: 2, 104: 5, 49: 4, 29923: 5}, 3: {59: 3, 29889: 4, 52: 3, 29945: 3, 49: 4, 72: 5, 104: 5, 54: 3, 29947: 3, 56: 3, 60: 3, 29896: 3, 58: 3, 29929: 3, 51: 3, 29953: 3, 29923: 5, 29872: 5, 29900: 3, 29955: 3, 55: 3, 53: 3, 29946: 3, 29906: 3, 2: 3, 57: 3, 29941: 3}, 4: {29955: 8, 55: 8, 29946: 8, 29906: 8, 57: 8, 29941: 8, 59: 8, 52: 8, 29945: 8, 54: 8, 29947: 8, 58: 8, 56: 8, 60: 8, 29896: 8, 29929: 8, 51: 8, 29953: 8, 53: 8, 29900: 8}, 5: {29899: 6, 48: 6, 29974: 6, 46: 6}, 6: {54: 7, 29947: 7, 56: 7, 29896: 7, 58: 7, 29929: 7, 60: 7, 29953: 7, 51: 7, 29900: 7, 29955: 7, 53: 7, 29946: 7, 55: 7, 29906: 7, 57: 7, 29941: 7, 59: 7, 52: 7, 29945: 7}, 7: {54: 7, 29947: 7, 56: 7, 29896: 7, 58: 7, 29929: 7, 60: 7, 29953: 7, 51: 7, 29900: 7, 29955: 7, 53: 7, 29946: 7, 55: 7, 29906: 7, 2: 7, 57: 7, 29941: 7, 59: 7, 52: 7, 29945: 7}, 8: {29955: 8, 55: 8, 29946: 8, 29906: 8, 2: 8, 57: 8, 29941: 8, 72: 5, 59: 8, 29923: 5, 52: 8, 29945: 8, 29872: 5, 54: 8, 29947: 8, 58: 8, 56: 8, 60: 8, 29896: 8, 29929: 8, 51: 8, 29953: 8, 104: 5, 53: 8, 29900: 8}}, set(), frozenset({2, 3, 7, 8, -1}))
Running the outlines.generate.choice method returns 2 objects correctly, the dictionary, and the set.
Maybe found a quick fix: commenting out the @cache seems to fix this!
i.e.
class RegexFSM(FSM):
"""FSM to generate text that is in the language of a regular expression."""
def __init__(self, regex_string: str, tokenizer: "Tokenizer"):
# @cache()
def create_states_mapping(
Not sure how this affects performance though, but:
prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?
Review: This restaurant is just awesome!
"""
generator = outlines.generate.choice(model, ["Positive", "Negative"])
for i in range(100):
answer = generator(prompt)
With cache commented out: 4.38 s ± 429 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With cache: 4.06 s ± 43.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
https://github.com/outlines-dev/outlines/pull/566 should have fixed this. It invalidates the cache if the version is upgraded.
Can you confirm which version you are running via from outlines import _version; print(_version.version)
(this command will include the git revision which is helpful for me)
I'm on: 0.0.25.dev15+g0cd9608
I found the source of the issue. Outlines cache is cleared if there's a version upgrade, however installing from git
via pip
doesn't seem to set the version in the same way that pip install .
from the repo directory does.
root@C.8986380:~$ pip install outlines git+https://github.com/outlines-dev/outlines -q
root@C.8986380:~$ python3 -c "from outlines._version import __version__ as outlines_version; print(outlines_version)"
0.0.24
We need to ensure the version in from outlines._version import __version__
is distinct even if installed from pip. Thanks for helping us discover this!
@dnhkng as a temporary fix I recommend running rm -rf ~/.cache/outlines
Best route forward IMO:
0.0.25
to incorporate the cache invalidation fix in an official release (@rlouf)setuptools_scm
issue (I think it's upstream upon brief review), and recommend installation of prereleases via git clone <>
, pip install .
in the mean-time.Works in my environment. @dnhkng could you please confirm your reproduction code no longer fails in your conda environment if you run
rm -rf ~/.cache/outlines
pip install outlines==0.0.24
python3 your_script_in_original_post.py
pip install outlines==0.0.25
python3 your_script_in_original_post.py
Looks ok now!
I'm still getting strings instead of floats, but I've raised a separate issue for that.
Describe the issue as clearly as possible:
When I try the examples on the github front page, some do not work from a fresh conda environment.
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information
Context for the issue:
This is a very weird bug! If I run the "outlines.generate.format" code, very occasionally, I also get the "outlines.generate.choice" method to run too! But 99% of the time, I get this error.
I did some digging, and added some debug code:
When I run the working code, I see:
But the buggy code produces:
So, the function "create_states_mapping" is not returning the frozenset, so the tuple only has 2 on the 3 items to unpack!