Consider both contiguous and channels_last tensors for FusedSGD - Githubissues

ROCm / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD 3-Clause "New" or "Revised" License

17 stars 14 forks source link

Consider both contiguous and channels_last tensors for FusedSGD #97

Closed hubertlu-tw closed 1 year ago

hubertlu-tw commented 1 year ago

Authored by @luise1030 to address the issue of tensor memory type inconsistency on Resnet50 trained with NHWC format. To run the corresponding unit tests for the code changes in this PR: $ python tests/L0/run_test.py --include run_optimizers

test_float (test_fused_optimizer.TestFusedSGD) ... ok
test_half (test_fused_optimizer.TestFusedSGD) ... ok
test_multi_device (test_fused_optimizer.TestFusedSGD) ... ok

Internal JIRA ticket for the context: https://ontrack-internal.amd.com/browse/SWDEV-357815

hubertlu-tw commented 1 year ago

This PR is to resolve the issue when p in parameters() of apex.optimizers.FusedSGD and p.grad are not in same memory format.

Before this PR:

	p (in parameters() of a optimizer)	p.grad	Results from Test A = `torch.optim.SGD` and Test B = `torch.optim.SGD`
Test A Test B	`torch.contiguous_format` `torch.channels_last`	`torch.contiguous_format` `torch.contiguous_format`	Same

	p	p.grad	Results from Test A = `torch.optim.SGD` and Test B = `apex.optimizers.FusedSGD`
Test A Test B	`torch.contiguous_format` `torch.channels_last`	`torch.contiguous_format` `torch.contiguous_format`	Different

At this PR:

	p (in parameters() of a optimizer)	p.grad	Results from Test A = `torch.optim.SGD` and Test B = `torch.optim.SGD`
Test A Test B	`torch.contiguous_format` `torch.channels_last`	`torch.contiguous_format` `torch.contiguous_format`	Same

	p	p.grad	Results from Test A = `torch.optim.SGD` and Test B = `apex.optimizers.FusedSGD`
Test A Test B	`torch.contiguous_format` `torch.channels_last`	`torch.contiguous_format` `torch.contiguous_format`	Same