-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
- Pytho…
-
Hello,
I'm trying to follow these tutorial: https://nvidia.github.io/cuda-quantum/latest/applications/python/unitary_compilation_diffusion_models.html
When calling:
`out_tensors = infer_comp.ge…
-
args: Namespace(adam_epsilon=1e-06, bert_lr=3e-05, channel_type='context-based', config_name='', data_dir='./dataset/docred', dataset='docred', dev_file='dev.json',
down_dim=256, evaluation_steps=-1…
-
# 🐛 Bug
When sampling from a prior that's been moved to GPU, the correct device is only used for some priors, even though the `state_dict` has been updated correctly (as of https://github.com/corne…
-
(pyg) ubuntu@ubuntu:/mnt/disk/hzy/pyg/HaPPy$ python train.py
loading data
start training
epoch: 0 i 0 loss: 48.30012893676758
Traceback (most recent call last):
File "train.py", line 326, in…
-
When I run:
> RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct
I get:
…
-
**Describe the bug**
I tried to train model with 'mobilenetv4_conv_large.e600_r384_in1k' as backbone and got this error. Other models train without any problems.
```
/home/xxxxxx/.local/lib/pyt…
-
Hi authors, I find this work useful and meaningful. I am trying to fine-tune the model on my own curated pmhc-tcr pairs. However, I encountered some issue with the kfold_data in the config. I replace …
-
```
python train.py --outdir=./test --data=./images256x256.zip --cfg=stylegan3-r --gpus=1 --batch=32 --gamma=0.5 \
--freezed=13 --workers=2 --mirror=1 --kimg=2000 --tick=1 --snap=10 --metrics=none -…
-
### Feature request
Is there a tutorial for using DeepSpeed's activation checkpointing instead of PyTorch's?
I'm using `Trainer` with ZeRO integration to train my model. Here's my code:
```py…