-
_From @billti on November 1, 2015 6:10_
The API call `document.getWordRangeAtPosition(position)` appears to use its own definition of a word. For example, my tmLanguage defines `attrib-name` as a tok…
-
使用exo+mlx多台mac运行llama-3.1-70b,返现量化时报错
报错的位置:
quantized.py文件
代码:
def call(self, x):
s = x.shape
x = x.flatten()
out = mx.dequantize(
self["weight"][x],
scales=self["scales"][x],
biases=self["…
-
The HF documentation says that you can now export seq2seq to ONNX with the OnnxSeq2SeqConfigWithPast class.
https://huggingface.co/docs/transformers/v4.23.1/en/main_classes/onnx#onnx-configurations
…
-
### System Info
```shell
optimum==1.19.2
torch==2.1.2
transformers==4.39.3
onnxruntime-gpu==1.17.1
CUDA Version: 12.2
GPU: L4
```
### Who can help?
@michaelbenayoun
### Information
- [X] Th…
-
I have tested the static cache inference, but the results are not as expected. I observed that the first two runs are for warming up, torch compiling... The third run is fast as expected, but from the…
-
Hi team,
I'm using Ray and vLLM to serve `Qwen2-72B-Instruct` with 2 different methods:
- using `LLM` class
this is the recommended method for offline batch inference method described in t…
-
I only copied the code from the ReadMe, I installed the LLama NuGet package with the CPU-Only backend, and it always returns
System.AccessViolationException: "Attempted to read or write protected …
-
I build and maintain a library for parsing property list files in Rust, [plist-rs](http://github.com/conradev/plist-rs), and I created benchmarks to compare it to the other common plist parsing librar…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
![Snipaste_2024-08-30_01-20-17](https://github.com/user-attachments/assets/29edc0c4-ac44-4ccf-b8d3-e82d…
-
### Describe the issue as clearly as possible:
When running provided arithmetic grammar example with vLLM, I get an error `TypeError: Error in model execution: argument 'ids': 'list' object cannot …