Open wawjwyy123 opened 1 year ago
The baseline code right now does not work well with multi-GPU and IMO it is not needed as the training is already very fast. It is better to do hyper-parameter optimization across GPUs. I ve added an optuna script for that.
It is helpful for me to change the torch lighting mode to "dp", according to this issue: issues38. But the problem is that, setting torch-lighting to "dp" mode causes two problems when working with multi-gpu:
and the program gets the right filename set according to the hashing value:
torchmetrics.classification.f_beta.MultilabelF1Score
doesn't work well in multi-GPU situation(refer to this issue). I don't no how to fix the bug(I'm not familiar with torch-lighting). My solution is to comment out the code associated with using torch-metric. Maybe someone knows how to fix this bug?