Closed anwai98 closed 1 month ago
~For reference, this is the error I get:~
~AssertionError: No inf checks were recorded for this optimizer.
~
Edit: The issue above is fixed now (the model expects the params from the DDP wrapped model). Next, I encountered some synchronization issues, found out that there's a parameter find_unused_parameters
in DDP
which takes care of these issues dynamically, however it makes the training very slow (need to investigate this).
@constantinpape This is GTG from my side.
Hi @constantinpape,
I made some minor changes to the DDP-based training to fit our SAM finetuning. ~There are a few issues in handling
mixed_precision
, would be good to take a look at it.~