-
Thought about this a bit, and I have a few ideas to share. # 1 I think is straightforward. # 2 is a bit more complicated but probably worthwhile. # 3 is way out there, but perhaps may solve problem …
-
**1.安装依赖后,执行命令 `streamlit run web_feadback.py --server.port=8080`,报错:**
```
/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/en…
-
`lifetimes` has a utility function for validating inputs prior to model fitting, which I expanded on in my `btyd` fork for `GammaGammaModel` input validation:
https://github.com/ColtAllen/btyd/blob…
-
![Screenshot](https://github.com/user-attachments/assets/f6541a9e-b8d5-4242-a1a2-b3a456e7a716)
`Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/util…
-
Even if Seed is given to generator to generate, the result changes slightly each time.
```
def torch_fix_seed(seed=42):
# Python random
random.seed(seed)
# Numpy
np.random.seed(s…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
Hi, I'm reproducing your job.
When I use the `round3_training_data.json` data to sft the `deepseek-math-7b-base-value_model` (after added value head), I got below error:
```shell
File "/home/wor…
-
# 🌟 New model addition
## Model description
https://arxiv.org/abs/2107.02192
In this paper, they propose Long-Short Transformer, an efficient self-attention mechanism for modeling long sequen…
-
### System Info
I am trying to run a qwen2-7b-instruct with AWQ quantized in a kubernetes environment. GPU is single T4 (16 GB VRAM).
I see that it is unable to use Flash attention v2 on T4 and …
nethi updated
1 month ago
-
Hello, I am trying to run the code on the colab provided. I have not change anything in the code yet.
after I ran the part:
from diffusers import DiffusionPipeline
```
pipe = DiffusionPipeline.…