Closed Vozf closed 2 years ago
If you want to reproduce all results (TE1 to TE7), at least 8x GPUs (32G) are required (We used 8x V100). A single GPU having 24G can reproduce the results up to TE3 with a batch size of 32.
Thanks for the reply. How does batch size affect the training in your experience? Have you tried gradient accumulation?
No, we didn't use the gradient accumulation in this research. I'm not sure employing GA can fully reproduce the original result.
When we experimented with the multiple ranges of batch size (i.e., 4 to 128), the network performance gradually increased within the batch size of 32. Using more than the batch size of 64, the performance decreased or showed similar to that of 32.
How much time did the training process take?
Given the batch size of 32, training TE3 and TE7 spent 3 to 4H and 11 to 14H, respectively. For a GPU having 24G VRAM, RTX 3090 is used.
Thanks for the clarifications
I managed to reproduce the metrics with a single GPU 3060 in all architectures except 0 and 5. Do you have any clue why that might be happening?
@christina284 did you manage to reproduce even te7 duts results with 3060?
@christina284 did you manage to reproduce even te7 duts results with 3060?
Yes, with a batch size of 16.
@christina284 How about the input image size? Did you set the input image size to 640 on TE7?
Wow, and you got 0.022 mae and other metrics on duts, right?
@christina284 How about the input image size? Did you set the input image size to 640 on TE7?
Yes, of course. It took about 10G RAM to run.
Wow, and you got 0.022 mae and other metrics on duts, right?
I actually tested it only on DUT-O dataset.
How long did it take to reproduce the results?
@christina284 How about the input image size? Did you set the input image size to 640 on TE7?
Yes, of course. It took about 10G RAM to run.
Wow, it's quite interesting. I tried it using RTX 3090 but was faced with OOM.
Were there any code adjustments to run on 3060?
@christina284 , could you please tell us more about your experience and the code adjustments? The current master code runs oom on anything above t0\320 with 11g vram.
What hardware was used for the training to reproduce the results in the paper?