Open captify-sivakhno opened 4 days ago
I actually run into a similar issue as well when trying to add CFG support to https://github.com/huggingface/text-generation-inference. Same error message with the same code path towards the last 3 function calls (see trace below). Any hints would be appreciated.
2024-10-31T06:56:33.558057Z ERROR text_generation_launcher: Method Decode encountered an error. Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module> sys.exit(app())
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 311, in __call__ return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 778, in main return _main(
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 216, in _main rv = self.invoke(ctx)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/cli.py", line 116, in serve server.serve(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 315, in serve asyncio.run(
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main)
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
> File "/opt/conda/lib/python3.11/site-packages/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 218, in Decode
generations, next_batch, timings = self.model.generate_token(batch)
File "/opt/conda/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py", line 1968, in generate_token
) = batch.next_token_chooser(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/tokens.py", line 364, in __call__
_scores = self.grammar_processor(_scores, self.fsm_grammar_states)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/logits_process.py", line 597, in __call__
allowed_tokens = fsm.get_next_instruction(fsm_grammar_states[i]).tokens
File "/opt/conda/lib/python3.11/site-packages/outlines/fsm/guide.py", line 154, in get_next_instruction
valid_tokens = list(
File "/opt/conda/lib/python3.11/site-packages/outlines/fsm/guide.py", line 189, in iter_valid_token_ids
self._get_parser_state_token_applied(state, int(token_id))
File "/opt/conda/lib/python3.11/site-packages/outlines/fsm/guide.py", line 241, in _get_parser_state_token_applied
prev_token_str = self.tokenizer.decode([[state.prev_token]])[0]
File "/opt/conda/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3999, in decode
return self._decode(
File "/opt/conda/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 654, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
TypeError: argument 'ids': 'list' object cannot be interpreted as an integer
It's hard to know if it's an issue on their end or ours. Running the same in outlines
directly should tell us.
Describe the issue as clearly as possible:
When running provided arithmetic grammar example with vLLM, I get an error
TypeError: Error in model execution: argument 'ids': 'list' object cannot be interpreted as an integer
. I presume this comes from de-tokenization, but still not sure how to fix it. Any suggestions would be welcome, as we have used outlines with vLLM successfully on a number of other use cases and really like the tool!Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information
Context for the issue:
No response