-
### The model to consider.
https://huggingface.co/THUDM/glm-4-9b-chat
### The closest model vllm already supports.
chatglm
### What's your difficulty of supporting the model you want?
_No respons…
-
When I tested Qwen2-7B on this library, it reported some errors.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from intel_npu_acceleration_library import NPUModelForCausalL…
-
## Describe the bug
Hi. I am using this dataloader, which is for processing large datasets in streaming mode mentioned in one of examples of huggingface. I am using it to read c4: https://github.com/…
-
There are places for improving runtime performance:
* Use local variables as proxy for class variables
* Tokenizers split vs regex vs merged sentencize/tokenizer
* Compile regexes
* Async processi…
-
Hi, when i run Orion-14B-Chat-Int4, by following code on A800-80G
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "OrionStarAI/Orion-…
-
Webpacking ```short-order``` fails given its use of ```fs```, ```dotenv``` and ```readline-sync```, which are used specifically in a node runtime. If these packages are not core to ```short-order```, …
-
I've tried the following in the .env.local file but get parsing errors (Error: Parse error on line 7:
...ssistant}})
"chatPromptTemplate" : "system\r\n\r\n{{ You are a friendly assistant }}user\r\…
-
Estimating the cost of each LLM call is a handy tool, but if we start supporting non-OpenAI models we might want the ability to do that in a more general way. Specifically, GooseAI (adding in #30) doe…
-
`src/main.rs`:
```rust
use std::fs;
use std::io::Cursor;
use std::default::Default;
extern crate html5ever;
use html5ever::parse_document;
use html5ever::driver::ParseOpts;
use html5ever…
-
Here's my issue when I try to use the streaming mode, I'm on windows :
2024-01-19 13:33:23.733 | WARNING | __mp_main__::78 - 'Streaming Mode' has certain limitations, you can read about them here …