-
## What
When I follow the installation instructions in the readme (`pip install timething`), I'm unable to run the `align-long` example. My investigation has led me to believe it's because it's ins…
-
I'm running into the following error while running demo.sh
==
```
Running forward
+ CUDA_VISIBLE_DEVICES=0 python demo.py --batch_size_v 80 --num_workers 4 --forward_save_path demo/forward
Name…
-
### 🐛 Describe the bug
I get the following output:
```python
[W CudaIPCTypes.cpp:95] Producer process tried to deallocate over 1000 memory blocks referred by consumer processes. Deallocation migh…
-
### 🐛 Describe the bug
The delay between inference request will impact the performance of latency.
If the delay is 0 second, the performance of bs=1 is best.
If the delay is 2s, the performance…
-
## 🐛 Bug
MLCengine code in quickstart guide on CPU fails with
> 'InternalError: Check failed: (it != n->end()) is false: cannot find the corresponding key in the Map'
followed by
> MLCE…
-
### Proposal to improve performance
I have observed that TTFT increases linearly with a total number of batched tokens.
For example, given 100k batch
- TTFT is around 2min when an average prompt…
-
Not sure if this is virtualizarr or icechunk's fault but the error was raised inside icechunk so I'm raising it here.
### MCVE
```python
import xarray as xr
ds_original = xr.tutorial.ope…
-
### 软件环境
```Markdown
- paddlepaddle:
- paddlepaddle-gpu: 3.0.0b1
- paddlenlp: https://github.com/ZHUI/PaddleNLP/tree/sci/benchmark
```
### 重复问题
- [X] I have searched the existing issues
### 错误描…
-
### System Info
PyTorch version: 2.6.0.dev20241101+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ub…
-
### 🐛 Describe the bug
As the picture below shows, `torch.index_select` outperforms regular indexing when the tensor size is small, but is outperformed when the size is large. Is this expected behavi…