Closed rushi-the-neural-arch closed 2 years ago
This one is just poor documentation and lack of exception checking on my part most likely... To handle grad clipping automatically you need to pass in an object of type ClipGradConfig
or ClipGradNormConfig
since stoke needs to map to the correct method (as some are not supported in certain cases.. e.g. OSS doesn't support clip by value).
So your code would be something like this:
stoke_model = Stoke(
model=model,
verbose=True, # verbose just prints out stuff, throws an error somewhere so disabled it
optimizer=optimizer,
loss=loss,
batch_size_per_device=opt.batchSize,
gpu=True,
fp16= None, #FP16Options.amp,
distributed= DistributedOptions.ddp, #"ddp", #DistributedOptions.ddp
fairscale_oss=True,
fairscale_sddp=True,
grad_accum_steps=4,
grad_clip=ClipGradNormConfig(max_norm = opt.grad_clip, norm_type=2.0),
configs=[amp_config, ddp_config, oss_config]
)
I've added an exception check to my local branch for your open issues which should prevent this from happening in the future...
Lmk if that fixes the error so I can close...
Yes, thank you this fixes the issue!
Describe the bug
The by default, verbose=True option in Stoke class throws an attrs error while printing out the configuration details
attr.exceptions.NotAnAttrsClassError: <class 'float'> is not an attrs-decorated class.
To Reproduce
The sample script is posted here - Stoke-DDP
Just change the
verbose=False
parameter toverbose=True
in the Stoke class argument to reproduce the bugpython -m torch.distributed.launch Stoke-DDP.py --projectName "PyTorch-4K-2X" --batchSize 20 --nEpochs 2 --lr 1e-3 --threads 8
Expected behavior
Print out all the parameters info passed in the Stoke Class
Screenshots/Code Snippets
Environment: