attr.exceptions.NotAnAttrsClassError: <class 'float'> is not an attrs-decorated class.

rushi-the-neural-arch commented 2 years ago

Describe the bug

The by default, verbose=True option in Stoke class throws an attrs error while printing out the configuration details

attr.exceptions.NotAnAttrsClassError: <class 'float'> is not an attrs-decorated class.

To Reproduce

The sample script is posted here - Stoke-DDP

Just change the verbose=False parameter to verbose=True in the Stoke class argument to reproduce the bug

python -m torch.distributed.launch Stoke-DDP.py --projectName "PyTorch-4K-2X" --batchSize 20 --nEpochs 2 --lr 1e-3 --threads 8

Expected behavior

Print out all the parameters info passed in the Stoke Class

Screenshots/Code Snippets


    stoke_model = Stoke(
        model=model,
        verbose=True,     # verbose just prints out stuff, throws an error somewhere so disabled it
        optimizer=optimizer,
        loss=loss,
        batch_size_per_device=opt.batchSize,
        gpu=True,
        fp16= None, #FP16Options.amp,
        distributed= DistributedOptions.ddp, #"ddp", #DistributedOptions.ddp
        fairscale_oss=True,
        fairscale_sddp=True,
        grad_accum_steps=4,
        grad_clip=opt.grad_clip,
        configs=[amp_config, ddp_config, oss_config]
    )

Environment:

OS: Ubuntu 18.04.5,
Python version - 3.7.7
PyTorch Version - 1.10:
Deepspeed Version: 0.5.4
Horovod Version: 0.23
Fairscale Version: 0.4.0
CUDA/cuDNN version: 11.2 / 7.6.2
Stoke configuration: 0.2.0

ncilfone commented 2 years ago

This one is just poor documentation and lack of exception checking on my part most likely... To handle grad clipping automatically you need to pass in an object of type ClipGradConfig or ClipGradNormConfig since stoke needs to map to the correct method (as some are not supported in certain cases.. e.g. OSS doesn't support clip by value).

So your code would be something like this:

stoke_model = Stoke(
        model=model,
        verbose=True,     # verbose just prints out stuff, throws an error somewhere so disabled it
        optimizer=optimizer,
        loss=loss,
        batch_size_per_device=opt.batchSize,
        gpu=True,
        fp16= None, #FP16Options.amp,
        distributed= DistributedOptions.ddp, #"ddp", #DistributedOptions.ddp
        fairscale_oss=True,
        fairscale_sddp=True,
        grad_accum_steps=4,
        grad_clip=ClipGradNormConfig(max_norm = opt.grad_clip, norm_type=2.0),
        configs=[amp_config, ddp_config, oss_config]
    )

I've added an exception check to my local branch for your open issues which should prevent this from happening in the future...

ncilfone commented 2 years ago

Lmk if that fixes the error so I can close...

rushi-the-neural-arch commented 2 years ago

Yes, thank you this fixes the issue!

fidelity / stoke