-
### Bug description
Hi team,
I tried to use elastic launch with pytorch lightning on a slurm cluster (one node with multiple gpus). The script worked fine if I use interactive mode but id not work…
-
Hi,
Thanks for your awesome code.
When I use four V100 GPUs, the program gets stuck as following:
```
(base)$ python launch.py --config configs/dreamfusion-if.yaml --train --gpu 0,1,2,3 syst…
-
### What is the motivation for this task?
Identify some flaws
### Describe the solution you'd like
dataset:
name: mvtec
format: folder
root: D:\project\anomalib\datasets\MVTec
normal_di…
-
### Bug description
Well! the title speaks for itself. When the train.fit(ckpt_path=...) is called with checkpoint, it breaks StepLR. And the lr no longer get changed by lr scheduler. I have prov…
-
## :bug: Bug
Hi HTC team,
Apologies to ask for help again so soon!
I think there may be a bug in `calc_dice_metric` from `evaluate_images.py`.
### Description
```bash
(env) tay@tay:~/Cod…
-
I train single speaker model from scratch like as **Instructions to run** in README.MD but i get this error when start training
CUDA_LAUNCH_BLOCKING=1 python pflow/train.py experiment=ljspeech
…
-
### System Info
```shell
Optimum Habana : 1.6.0
SynapseAI : 1.10.0
Docker Image : Habana® Deep Learning Base AMI (Ubuntu 20.04)
Volume : 1000 GiB
```
### Information
- [X] The official example …
-
训练的时候卡在这一步,看不到训练进程
(mvp) root@autodl-container-a34a11a952-3cb61709:~/autodl-tmp/absa/multi-view-prompting# bash scripts/run_unified.sh
+ export CUDA_VISIBLE_DEVICES=0
+ CUDA_VISIBLE_DEVICES=0
+ cd…
-
encountered a bug when running on colab:
2023-07-27 13:02:02.977976: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-pa…
-
## 🐛 Bug
`SpearmanCorrCoef` does not work with deepspeed strategy when precision is 16. I believe this is related to a unexpected type conversion from 32 to 16. Spearman logging works as expected…