Closed danielvais closed 9 months ago
Hi @danielvais Thank you for your interest in DEVIANT again. Here are a couple of things I would try:
The error appears what I train the full dataset.
Try evaluation in single GPU setting:
CUDA_VISIBLE_DEVICES=0 python -u tools/train_val.py --config=experiments/run_221.yaml --resume_model=... -e
Also check if the val batch size is large for the available GPU memory. I see that you use a batch size of 6
. You could try changing this line to make batch size as 2
.
Hi @abhi1kumar As you advised in my previous issue, I can't train the full dataset on a single GPU due to lack of memory.
Reducing the batch size didn't help, but I saw that the validation fail on the last batch which was of size 1 instead of 2. When I removed the last image from the validation dataset, which made the dataset to include an even number of images, the training was successful. I didn't dive deep in to understand why a dataset of odd number of images in the validation raise an error but this solution is good enough for me for now. Thanks:)
Glad that you were able to find a good enough solution.
Hi, I'm encountering an error in the first eval epoch. The error I get is: I am running the gupnet model training:
I was successful training the model on a sub dataset of only 300 images. The error appears what I train the full dataset. Any suggestions?