-
- [ ] [Understanding the kernel in Semantic Kernel | Microsoft Learn](https://learn.microsoft.com/en-us/semantic-kernel/agents/kernel/?tabs=python)
# Understanding the kernel in Semantic Kernel | Mi…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing…
-
I use whisper-timestamped backend because I could not get the faster-whisper working. I also use whisper_streaming in a Python code and simulate the online mode by processing 1-second long pieces of a…
-
As @bradfitz noted in the reviews of the original API, the `Decoder.ReadToken` API [is a garbage factory](https://go-review.googlesource.com/c/go/+/11651/2/src/encoding/json/stream.go#279). Although, …
-
>The idea is simple. Ask Claude to repeat some text and observe how the generation is streamed through the network. It turns out that Anthropic serves one token at a time!
```
python anthropic_tok…
-
Some generators are expensive to compute but can be re-used, perhaps functions which return generators can be cached (say, every time the generator's `__next__` is called or until `StopIteration`) so …
-
使用exo+mlx多台mac运行llama-3.1-70b,返现量化时报错
报错的位置:
quantized.py文件
代码:
def call(self, x):
s = x.shape
x = x.flatten()
out = mx.dequantize(
self["weight"][x],
scales=self["scales"][x],
biases=self["…
-
**Description**
I deployed a triton backend of Baichuan TensorRT engine successfully, but got segmentation fault error during streaming inference
**Triton Information**
I start the triton contain…
-
Hello,
Thank you for sharing your work.
I'm interested in evaluating alpaca-lora on QA tasks. I started with BoolQ dataset. I followed the `generate.py` script and constructed a prompt that work…
-
### Python -VV
```shell
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
```
### Pip Freeze
```shell
accelerate==0.33.0
addict==2.4.0
annotated-types==0.7.0
apex @ file:///data2/apex
…