-
I encountered a CUDA memory error when using torchsort.soft_rank during parallel training on the GPU. The error message is as follows:
File "/home/xxx/anaconda3/envs/DL2/lib/python3.10/site-packages/…
-
added `devices="auto"` in `train.py` to utilize multiple gpus
```
trainer: Trainer = hydra.utils.instantiate(cfg.trainer,
callbacks=callbacks,
…
-
### Describe the issue
While I was trying to deploy the artifacts and run the onnx model on the edge device (Linux OS aarch64) its showing up with the following error:
![image](https://github.com/…
-
I am working on ATAC-seq mice data and have developed a function to run bias model training across all folds and comparison groups. While testing the function with fold 1 and the group fed_vs_fasted_0…
-
When I tried to run "python main_infer.py --model_name RAN4 --data_name unpaired_ct_abdomen" in my V100 card. I encoutered the " OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key conv3…
-
This could be entirely due to my setup and any mods to get it runningn (but also posting in case anyone else runs into it), but the initial losses are NaN due to tensors being empty. During training t…
-
# Business Need
- Ensure that users have the latest Measuring Point CSV on their field devices
# To Test
## CSV File in media folder
- Configure survey to force updates (see Devon's Teams messag…
-
## 🐛 Bug
Hello all,
I'm implementing CycleGAN with Lightning. I use PSNR and SSIM from torchmetrics for evaluation.
During training, I see that my GPU ram memory increases non stop until overfl…
-
Hi, thanks again for this helpful repo. I am implementing this model training code but running into a strange problem. When I set my batch_size to 1 for both TF and SCST, I don't have any problems d…
-
When re-producing the experiment of pretraining on mag240m and evaluating on arxiv, we found that the Contrastive baseline results in the similar performance as Prodigy when the aux loss is applied (u…