I am implementing some weights penalty to the parameters passed into optimizers. But I found that after I use --fp 16, the penalty weights are still have parameters with dtype torch.float16 while the model parameters changed from 16 to 32 in the end.
Then, I defined a function to convert params back to float16,
def convertFP16(params):
fp16_params = []
for p in params:
p16 = torch.nn.Parameter(p.type(torch.float16))
p16.grad = torch.zeros_like(p16.data)
if hasattr(p, "param_group"):
p16.param_group = p.param_group
fp16_params.append(p16)
count = 0
for name, param in fp16_params:
if param in fp16_params:
print(count)
count+=1
return fp16_params
Hi,
I am implementing some weights penalty to the parameters passed into optimizers. But I found that after I use --fp 16, the penalty weights are still have parameters with dtype torch.float16 while the model parameters changed from 16 to 32 in the end.
Then, I defined a function to convert params back to float16,
def convertFP16(params): fp16_params = [] for p in params: p16 = torch.nn.Parameter(p.type(torch.float16)) p16.grad = torch.zeros_like(p16.data) if hasattr(p, "param_group"): p16.param_group = p.param_group fp16_params.append(p16) count = 0 for name, param in fp16_params: if param in fp16_params: print(count) count+=1 return fp16_params
But even so, it is still not working.
==================p====================== Parameter containing: tensor([[ 0.0251, 0.0024, 0.0033, ..., 0.0007, -0.0041, 0.0183], [ 0.0001, 0.0126, -0.0248, ..., -0.0026, -0.0132, 0.0211], [-0.0163, 0.0131, -0.0155, ..., -0.0236, -0.0059, 0.0060], ..., [-0.0058, -0.0003, 0.0309, ..., 0.0243, -0.0067, -0.0345], [-0.0377, -0.0127, -0.0095, ..., 0.0212, 0.0046, 0.0353], [-0.0137, 0.0203, -0.0120, ..., -0.0111, -0.0202, 0.0170]], device='cuda:0', dtype=torch.float16, requires_grad=True) =========================reg_params_list[0]============== Parameter containing: tensor([[ 0.0251, 0.0024, 0.0033, ..., 0.0007, -0.0041, 0.0183], [ 0.0001, 0.0126, -0.0248, ..., -0.0026, -0.0132, 0.0211], [-0.0163, 0.0131, -0.0155, ..., -0.0236, -0.0059, 0.0060], ..., [-0.0058, -0.0003, 0.0309, ..., 0.0243, -0.0067, -0.0345], [-0.0377, -0.0127, -0.0095, ..., 0.0212, 0.0046, 0.0353], [-0.0137, 0.0203, -0.0120, ..., -0.0111, -0.0202, 0.0170]], device='cuda:0', dtype=torch.float16, requires_grad=True) p in reg_params = False
Even so, from my eyes, it looks that the two are identical. It still says that p is not in reg_params.
May you give me help here?
Thanks in advance,