-
- [ ] [language-model.md - vscode-docs [GitHub] - Visual Studio Code - GitHub](https://vscode.dev/github/microsoft/vscode-docs/blob/main/api/extension-guides/language-model.md)
# language-model.md -…
-
In PyTorch distributed training, I get:
```
File "/rwthfs/rz/cluster/home/az668407/setups/combined/2021-05-31/tools/returnn/returnn/torch/engine.py", line 198, in Engine.init_train_from_config
…
-
When I use A750 to run BigDL to load the Qwen-7b int4 model, it will show that the memory is exceeded, I don't know what's going on, is there a problem with my operation?
The following is the error m…
-
This is more of a request, but would you be able to support using custom embeddings and negative embeddings as pipeline arguments? The reason I want to do this is so I can use prompt engineering techn…
-
### Checklist
- [X] The issue exists after disabling all extensions
- [X] The issue exists on a clean installation of webui
- [ ] The issue is caused by an extension, but I believe it is caused b…
SysVR updated
7 months ago
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A…
-
As title indicates I'd be interested in understanding whether this is just for text-generation or whether it could also be used to expose the embedding function?
-
### Feature request
It would be nice to constrain the model output with a CFG directly when calling `model.generate`.
This is already done by llama.cpp [grammars](https://github.com/ggerganov/ll…
-
-
Hi.
I use Llama 3, and I'd like to stream the output.
I mean, it should be somehow with the 8001 port API. I'd like generate few tokens and send it to client time by time. Is it possible?
It could …