As recommended as a stopgap measure in issue 299, I installed OpenLLM with the --no-binary flag and tried to launch and query a LLaMa 13B model.
This resulted in a Torch error.
I understand this is secondary compared to issue 299. Please discard if the patch for issue 299 also fixes this.
To reproduce
Installing OpenLLM
pip install -U --no-binary openllm-core "openllm[llama, vllm, fine-tune]"
pip install scipy
pip install protobuf==3.20.3 # this was necessary because of a common error with compilation incompatibilities
From the first terminal (where the model was launched):
2023-09-06T11:38:16+0000 [INFO] [runner:llm-llama-runner:1] - "GET /readyz HTTP/1.1" 200 (trace=9d00197e5e9b5c00ce7a730f7cd9083e,span=e56f8391327279cb,sampled=1,service.name=llm-llama-runner)
2023-09-06T11:38:16+0000 [INFO] [runner:llm-llama-runner:1] _ (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 0.638ms (trace=9d00197e5e9b5c00ce7a730f7cd9083e,span=1c975528a02988f7,sampled=1,service.name=llm-llama-runner)
2023-09-06T11:38:16+0000 [INFO] [api_server:29] 127.0.0.1:46644 - "GET /readyz HTTP/1.1" 200 (trace=9d00197e5e9b5c00ce7a730f7cd9083e,span=1e20d81925215734,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:29] 127.0.0.1:46644 (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 79.659ms (trace=9d00197e5e9b5c00ce7a730f7cd9083e,span=6767a7ec578eef9a,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [runner:llm-llama-runner:1] - "GET /readyz HTTP/1.1" 200 (trace=76f5c09186ad1bf26becd83971437d3c,span=1d3f82fb9d3c9aa0,sampled=1,service.name=llm-llama-runner)
2023-09-06T11:38:16+0000 [INFO] [runner:llm-llama-runner:1] _ (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 0.422ms (trace=76f5c09186ad1bf26becd83971437d3c,span=870b58580b00b9f8,sampled=1,service.name=llm-llama-runner)
2023-09-06T11:38:16+0000 [INFO] [api_server:29] 127.0.0.1:46648 - "GET /readyz HTTP/1.1" 200 (trace=76f5c09186ad1bf26becd83971437d3c,span=f77657db2f5062f5,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:29] 127.0.0.1:46648 (scheme=http,method=GET,path=/readyz,type=,length=) (status=200,type=text/plain; charset=utf-8,length=1) 2.945ms (trace=76f5c09186ad1bf26becd83971437d3c,span=959446bff37e8b99,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46654 - "GET /docs.json HTTP/1.1" 200 (trace=57df2df3885999108733004f8e8d9f02,span=456ac29ce91bed6f,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46654 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=11034) 14.689ms (trace=57df2df3885999108733004f8e8d9f02,span=141a51bdcab81cbf,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 - "POST /v1/metadata HTTP/1.1" 200 (trace=b05deaf686c0a32f1b08241cfd2dbcc7,span=d92c0a230b8fae10,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=907) 2.542ms (trace=b05deaf686c0a32f1b08241cfd2dbcc7,span=23a1663e7f42ea11,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 - "POST /v1/metadata HTTP/1.1" 200 (trace=d2d69c5af53bba442e26ac5ad983a07f,span=a088f3943117b389,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=907) 0.927ms (trace=d2d69c5af53bba442e26ac5ad983a07f,span=5c6b6c748803f63b,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 - "POST /v1/metadata HTTP/1.1" 200 (trace=3a7362ecad91005829ef300bf392c97c,span=037e3e619c588aa0,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=907) 0.908ms (trace=3a7362ecad91005829ef300bf392c97c,span=6f6476058710b337,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 - "POST /v1/metadata HTTP/1.1" 200 (trace=efc1c27684ee8f6dfcb4046c3c3fe2da,span=ccc6f6b1ac9e78e5,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=907) 1.076ms (trace=efc1c27684ee8f6dfcb4046c3c3fe2da,span=a38eb0361f6979e0,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 - "POST /v1/metadata HTTP/1.1" 200 (trace=b07fe0780a98ab7402526950745d5b1d,span=3ba5a16f6eacc14d,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 (scheme=http,method=POST,path=/v1/metadata,type=text/plain; charset=utf-8,length=0) (status=200,type=application/json,length=907) 0.898ms (trace=b07fe0780a98ab7402526950745d5b1d,span=1bb9f49c56688f0a,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [DEBUG] [runner:llm-llama-runner:1] Starting dispatcher optimizer training... (trace=e7f29e8091c87e8c4ce71c7dc49946fe,span=8ec7bd686cbd6de6,sampled=1,service.name=llm-llama-runner)
2023-09-06T11:38:16+0000 [DEBUG] [runner:llm-llama-runner:1] Dynamic batching cork released, batch size: 1 (trace=e7f29e8091c87e8c4ce71c7dc49946fe,span=8ec7bd686cbd6de6,sampled=1,service.name=llm-llama-runner)
2023-09-06T11:38:16+0000 [INFO] [runner:llm-llama-runner:1] - "POST /generate HTTP/1.1" 500
2023-09-06T11:38:16+0000 [ERROR] [runner:llm-llama-runner:1] Exception in ASGI application
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/ubuntu/.local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__
await self.app(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 580, in __call__
await self.app(scope, otel_receive, otel_send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/http/instruments.py", line 252, in __call__
await self.app(scope, receive, wrapped_send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__
await self.app(scope, receive, wrapped_send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/runner_app.py", line 291, in _request_handler
payload = await infer(params)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/marshal/dispatcher.py", line 182, in _func
raise r
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/marshal/dispatcher.py", line 377, in outbound_call
outputs = await self.callback(
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/runner_app.py", line 271, in infer_single
ret = await runner_method.async_run(*params.args, **params.kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 62, in async_run_method
return await anyio.to_thread.run_sync(
File "/home/ubuntu/.local/lib/python3.8/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/ubuntu/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/ubuntu/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runnable.py", line 143, in method
return self.func(obj, *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/_llm.py", line 1185, in generate
return self.generate(prompt, **attrs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/_llm.py", line 943, in generate
for it in self.generate_iterator(prompt, **attrs):
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/_llm.py", line 981, in generate_iterator
out = self.model(torch.as_tensor([input_ids]), use_cache=True)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 820, in forward
outputs = self.model(
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 662, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
2023-09-06T11:38:16+0000 [ERROR] [api_server:30] Exception on /v1/generate [POST] (trace=e7f29e8091c87e8c4ce71c7dc49946fe,span=533d6a8d3e4c547d,sampled=1,service.name=llm-llama-service)
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/_service.py", line 50, in generate_v1
responses = await runner.generate.async_run(qa_inputs.prompt, **{'adapter_name': qa_inputs.adapter_name, **config})
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 242, in async_run_method
raise RemoteException(
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner llm-llama-runner: [500] Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__
await self.app(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 580, in __call__
await self.app(scope, otel_receive, otel_send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/http/instruments.py", line 252, in __call__
await self.app(scope, receive, wrapped_send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__
await self.app(scope, receive, wrapped_send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/ubuntu/.local/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/runner_app.py", line 291, in _request_handler
payload = await infer(params)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/marshal/dispatcher.py", line 182, in _func
raise r
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/marshal/dispatcher.py", line 377, in outbound_call
outputs = await self.callback(
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/server/runner_app.py", line 271, in infer_single
ret = await runner_method.async_run(*params.args, **params.kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 62, in async_run_method
return await anyio.to_thread.run_sync(
File "/home/ubuntu/.local/lib/python3.8/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/ubuntu/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/ubuntu/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bentoml/_internal/runner/runnable.py", line 143, in method
return self.func(obj, *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/_llm.py", line 1185, in generate
return self.generate(prompt, **attrs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/_llm.py", line 943, in generate
for it in self.generate_iterator(prompt, **attrs):
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/_llm.py", line 981, in generate_iterator
out = self.model(torch.as_tensor([input_ids]), use_cache=True)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 820, in forward
outputs = self.model(
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 662, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 - "POST /v1/generate HTTP/1.1" 500 (trace=e7f29e8091c87e8c4ce71c7dc49946fe,span=50be2c0a17a3dd19,sampled=1,service.name=llm-llama-service)
2023-09-06T11:38:16+0000 [INFO] [api_server:30] 127.0.0.1:46664 (scheme=http,method=POST,path=/v1/generate,type=application/json,length=719) (status=500,type=application/json,length=2) 146.100ms (trace=e7f29e8091c87e8c4ce71c7dc49946fe,span=533d6a8d3e4c547d,sampled=1,service.name=llm-llama-service)
From the second terminal (where the model was queried):
==Input==
What is deep learning ?
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/openllm", line 8, in <module>
sys.exit(cli())
File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/cli/entrypoint.py", line 189, in wrapper
return_value = func(*args, **attrs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/cli/entrypoint.py", line 171, in wrapper
return f(*args, **attrs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm/cli/entrypoint.py", line 868, in query_command
res = client.query(prompt, return_response='raw', **{**client.configuration, **_memoized})
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm_client/_base.py", line 269, in query
r = openllm_core.GenerationOutput(**self.call('generate', openllm_core.GenerationInput(prompt=prompt, llm_config=self.config.model_construct_env(**generate_kwargs)).model_dump()))
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm_client/_base.py", line 165, in call
return self.inner.call(f'{api_name}_{self._api_version}', *args, **attrs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm_client/benmin/__init__.py", line 43, in call
return self._call(data, _inference_api=self.svc.apis[bentoml_api_name], **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/openllm_client/benmin/_http.py", line 104, in _call
if resp.status_code != 200: raise ValueError(f'Error while making request: {resp.status_code}: {resp.content!s}')
ValueError: Error while making request: 500: b'""'
Describe the bug
As recommended as a stopgap measure in issue 299, I installed OpenLLM with the
--no-binary
flag and tried to launch and query a LLaMa 13B model.This resulted in a Torch error.
I understand this is secondary compared to issue 299. Please discard if the patch for issue 299 also fixes this.
To reproduce
Installing OpenLLM
Launching the service
Querying the model (from another terminal)
Logs
From the first terminal (where the model was launched):
From the second terminal (where the model was queried):
Environment
Environment variable
System information
bentoml
: 1.1.5python
: 3.8.10platform
: Linux-5.15.0-67-generic-x86_64-with-glibc2.29uid_gid
: 1000:1000pip_packages
``` absl-py==0.15.0 accelerate==0.22.0 aiofiles==22.1.0 aiohttp==3.8.5 aiosignal==1.3.1 aiosqlite==0.18.0 anyio==3.7.1 appdirs==1.4.3 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asgiref==3.7.2 astunparse==1.6.2 async-timeout==4.0.3 atomicwrites==1.1.5 attrs==23.1.0 Automat==0.8.0 Babel==2.12.1 backcall==0.1.0 beautifulsoup4==4.8.2 bentoml==1.1.5 bitsandbytes==0.41.1 bleach==3.1.1 blinker==1.4 blosc==1.7.0 bottle==0.12.15 build==1.0.0 cachetools==4.0.0 caffe==1.0.0 cattrs==23.1.2 certifi==2019.11.28 cffi==1.14.0 chardet==3.0.4 charset-normalizer==3.1.0 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloud-init==22.4.2 cloudpickle==2.2.1 cmake==3.27.4.1 colorama==0.4.3 coloredlogs==15.0.1 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 contextlib2==21.6.0 contourpy==1.0.7 cryptography==2.8 ctop==1.0.0 cuda-python==12.2.0 cycler==0.10.0 Cython==0.29.14 dask==2.8.1+dfsg datasets==2.14.4 dbus-python==1.2.16 decorator==4.4.2 deepmerge==1.1.0 defusedxml==0.6.0 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.0 distro==1.4.0 distro-info===0.23ubuntu1 docker==4.1.0 entrypoints==0.3 et-xmlfile==1.0.1 exceptiongroup==1.1.3 fairscale==0.4.13 fastapi==0.103.1 fastcore==1.5.29 fastjsonschema==2.16.3 filelock==3.0.12 filetype==1.2.0 flake8==3.7.9 flatbuffers==1.12 fonttools==4.39.0 fqdn==1.5.1 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.9.0 future==0.18.2 gast==0.4.0 ghapi==1.0.4 Glances==3.1.3 google-auth==1.5.1 google-auth-oauthlib==0.4.1 google-pasta==0.2.0 grpcio==1.57.0 h11==0.14.0 h5py==2.10.0 html5lib==1.0.1 htmlmin==0.1.12 httpcore==0.17.3 httplib2==0.14.0 httpx==0.24.1 huggingface-hub==0.16.4 humanfriendly==10.0 hyperlink==19.0.0 icdiff==1.9.5 idna==2.8 ImageHash==4.3.1 imageio==2.4.1 importlib-metadata==6.0.0 importlib-resources==5.12.0 incremental==16.10.1 inflection==0.5.1 influxdb==5.2.0 iotop==0.6 ipykernel==5.2.0 ipython==7.13.0 ipython_genutils==0.2.0 ipywidgets==8.0.4 isoduration==20.11.0 jdcal==1.0 jedi==0.15.2 Jinja2==3.1.2 joblib==1.2.0 json5==0.9.11 jsonpatch==1.22 jsonpointer==2.0 jsonschema==4.17.3 jupyter-console==6.0.0 jupyter-events==0.6.3 jupyter-ydoc==0.2.3 jupyter_client==8.0.3 jupyter_core==5.2.0 jupyter_server==2.4.0 jupyter_server_fileid==0.8.0 jupyter_server_terminals==0.4.4 jupyter_server_ydoc==0.6.1 jupyterlab==3.6.1 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.5 jupyterlab_server==2.20.0 kaptan==0.5.10 keras==2.11.0 keyring==18.0.1 kiwisolver==1.0.1 language-selector==0.1 launchpadlib==1.10.13 lazr.restfulclient==0.14.2 lazr.uri==1.0.3 libtmux==0.8.2 lit==16.0.6 locket==0.2.0 lxml==4.5.0 Mako==1.1.0 Markdown==3.1.1 markdown-it-py==3.0.0 MarkupSafe==2.1.2 matplotlib==3.6.3 mccabe==0.6.1 mdurl==0.1.2 mistune==2.0.5 more-itertools==4.2.0 mpi4py==3.0.3 mpmath==1.3.0 msgpack==1.0.5 multidict==6.0.4 multimethod==1.9.1 multiprocess==0.70.15 mypy-extensions==1.0.0 nbclassic==0.5.3 nbclient==0.7.2 nbconvert==7.2.9 nbformat==5.7.3 nest-asyncio==1.5.6 netifaces==0.10.4 networkx==2.4 ninja==1.11.1 nose==1.3.7 notebook==6.0.3 notebook_shim==0.2.2 numexpr==2.7.1 numpy==1.23.5 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-ml-py==7.352.0 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oauthlib==3.1.0 olefile==0.46 openllm==0.3.0 openllm-client==0.3.0 openllm-core==0.3.0 openpyxl==3.0.3 opentelemetry-api==1.18.0 opentelemetry-instrumentation==0.39b0 opentelemetry-instrumentation-aiohttp-client==0.39b0 opentelemetry-instrumentation-asgi==0.39b0 opentelemetry-sdk==1.18.0 opentelemetry-semantic-conventions==0.39b0 opentelemetry-util-http==0.39b0 opt-einsum==3.3.0 optimum==1.12.0 orjson==3.9.5 packaging==23.0 pandas==1.5.3 pandas-profiling==3.6.6 pandocfilters==1.4.2 parameterized==0.7.0 parso==0.5.2 partd==1.0.0 pathspec==0.11.2 patsy==0.5.3 peft==0.5.0 pexpect==4.6.0 phik==0.12.3 pickleshare==0.7.5 Pillow==7.0.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 pkgutil_resolve_name==1.3.10 platformdirs==3.1.1 pluggy==0.13.0 ply==3.11 prometheus-client==0.17.1 prompt-toolkit==2.0.10 protobuf==3.20.3 psutil==5.5.1 ptyprocess==0.7.0 py==1.8.1 pyarrow==13.0.0 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycodestyle==2.5.0 pycparser==2.19 pycryptodomex==3.6.1 pycuda==2019.1.2 pydantic==1.10.6 pydot==1.4.1 pyflakes==2.1.1 Pygments==2.14.0 PyGObject==3.36.0 pygpu==0.7.6 PyHamcrest==1.9.0 pyinotify==0.9.6 PyJWT==1.7.1 pymacaroons==0.13.0 PyNaCl==1.3.0 pynvml==11.5.0 pyOpenSSL==19.0.0 pyparsing==2.4.6 pyproject_hooks==1.0.0 pyrsistent==0.15.5 pyserial==3.4 pysmi==0.3.2 pysnmp==4.4.6 pystache==0.5.4 pytest==4.6.9 python-apt==2.0.1 python-dateutil==2.8.2 python-debian===0.1.36ubuntu1 python-json-logger==2.0.7 python-multipart==0.0.6 pytools==2019.1.1 pytz==2022.7.1 PyWavelets==0.5.1 PyYAML==5.3.1 pyzmq==25.0.1 ray==2.6.3 regex==2023.8.8 requests==2.28.2 requests-oauthlib==1.0.0 requests-unixsocket==0.2.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rich==13.5.2 rsa==4.0 safetensors==0.3.3 schema==0.7.5 scikit-cuda==0.5.3 scikit-image==0.16.2 scikit-learn==0.22.2.post1 scipy==1.9.3 seaborn==0.12.2 SecretStorage==2.3.1 Send2Trash==1.8.0 sentencepiece==0.1.99 service-identity==18.1.0 simple-di==0.1.5 simplejson==3.16.0 six==1.14.0 sniffio==1.3.0 sos==4.4 soupsieve==1.9.5 ssh-import-id==5.10 starlette==0.27.0 statsmodels==0.13.5 sympy==1.12 systemd-python==234 tables==3.6.1 tabulate==0.9.0 tangled-up-in-unicode==0.2.0 tensorboard==2.11.0 tensorflow-estimator==2.11.0 tensorflow-gpu==2.11.0 termcolor==1.1.0 terminado==0.17.1 testpath==0.4.4 Theano==1.0.4 tinycss2==1.2.1 tmuxp==1.5.4 tokenizers==0.13.3 tomli==2.0.1 toolz==0.9.0 torch==2.0.1 torchvision==0.14.1 tornado==6.2 tqdm==4.64.1 traitlets==5.9.0 transformers==4.33.0 triton==2.0.0 trl==0.7.1 Twisted==18.9.0 typeguard==2.13.3 typing_extensions==4.5.0 ubuntu-advantage-tools==8001 ufw==0.36 unattended-upgrades==0.1 uri-template==1.2.0 urllib3==1.25.8 uvicorn==0.23.2 virtualenv==20.0.17 visions==0.7.5 vllm==0.1.4 wadllib==1.3.3 watchfiles==0.20.0 wcwidth==0.1.8 webcolors==1.12 webencodings==0.5.1 websocket-client==0.53.0 Werkzeug==0.16.1 widgetsnbextension==4.0.5 wrapt==1.11.2 xformers==0.0.21 xlrd==1.1.0 xlwt==1.3.0 xxhash==3.3.0 y-py==0.5.9 yarl==1.9.2 ydata-profiling==4.1.0 ypy-websocket==0.8.2 zipp==3.15.0 zope.interface==4.7.1 ```
transformers
version: 4.33.0System information (Optional)
A100 40GB SXM4 instance from LambdaLabs