-
When I want to reproduce the original model results of mistral-7b-v0.2 without `flash-attn` I got the error:
```
Traceback (most recent call last):
File "/home/yuanye/long_llm/InfLLM/benchmark/prβ¦
-
We want to move from pickled objects saved by `torch` or `torch.jit` to safetensors format for the weights of `docling-ibm-models`. This has various advantages, such as better security, and also acts β¦
-
### π The feature, motivation and pitch
For this code:
```python
import torch
class DummyModule(torch.nn.Module):
def __init__(self):
super(DummyModule, self).__init__()
β¦
-
When I run the validate.py,I encounter the following error:
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:05
-
hiοΌwhen I try to running your demo in PiA part, I get an error in 'instruction tuning' step:
```
root@0de6f5c3da0f:/workspace/zt/code/Sequence-Scheduling# bash train.sh
[2024-10-02 22:24:40,711] β¦
-
CUDA and torch version used in this project looks outdated.
Can I use CUDA 12.1 instead or does this cause bug?
-
Dear author:
Thanks a lot for your great contribution to multi task policy learning.
Any suggestion on debugging followed issue?
When I run followed cmdline, I got runtime error:
RuntimeError: β¦
-
Thanks for your awesome work.
Looking through the code of Mamba and Mamba2. I'm really confused about the dimension of the parameter dt. I understand that delta is used to discretize A and B in SSM. β¦
-
### π Describe the bug
The following benchmarks are at least 3x - if not 10x slower on mps than cpu on a recent macbook pro M3
```python
from torch.utils.benchmark import Timer
import torch
priβ¦
-
When I try to run patch_model_for_compiled_runtime on 8bit + aten, the program reports an error. How can I solve this problem?
![image](https://github.com/user-attachments/assets/f0a85477-f36e-4081-bβ¦