[Bug]TypeError: gather() received an invalid combination of arguments

OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models

Apache License 2.0

548 stars 74 forks source link

[Bug]TypeError: gather() received an invalid combination of arguments #24

Closed Kunlun-Zhu closed 2 years ago

Kunlun-Zhu commented 2 years ago

Hi developer, when I tried to use 'gather()' method from the 'Distributedparameter', I received the following error:

, line 43, in __init__
    self.rel_embed.weight /= torch.norm(self.rel_embed.weight.gather().detach(), p=self.p_norm, dim=-1)[:, None]
TypeError: gather() received an invalid combination of arguments - got (), but expected one of:
 * (int dim, Tensor index, *, bool sparse_grad)
 * (name dim, Tensor index, *, bool sparse_grad)

I coundn't find any information about this set of arguments required, any idea why this may occur or how may I solve this issue? Thanks a lot.

Kunlun-Zhu commented 2 years ago

I Currently fixed this by

self.rel_embed.weight.div_(torch.norm(self.rel_embed.weight, p=self.p_norm, dim=-1)[:, None])

a710128 commented 2 years ago

This is probably incorrect. self.rel_embed.weight is not changed inplace by using div_. Due to the BMTrain implementation, self.rel_embed.weight does not return the parameter itself, but an "intermediate result".

Kunlun-Zhu commented 2 years ago

Thanks a lot for the responding, 'self.rel_embed' is generated from the class 'Embedding' in the 'example/layer/embedding.py' file, it seems a direct reference from the class? Or may I ask how should we successfully change the value of the 'embedding.weight', thanks a lot!

This is probably incorrect. self.rel_embed.weight is not changed inplace by using div_. Due to the BMTrain implementation, self.rel_embed.weight does not return the parameter itself, but an "intermediate result".

a710128 commented 2 years ago

There is currently no proper way to update the parameters during training. You can normalize " self.rel_embed" before each use.