OpenBMB / ModelCenter

Efficient, Low-Resource, Distributed transformer implementation based on BMTrain
https://modelcenter.readthedocs.io
Apache License 2.0
233 stars 28 forks source link

Question in ModelCenter/model_center/layer/transformer.py #18

Closed Kunlun-Zhu closed 2 years ago

Kunlun-Zhu commented 2 years ago

May I ask in line 23:

Why did encoder and decoder use 'nn.module' instead of 'bmt.DistributedModule'.

May I ask in which circumstance we use 'nn.Module' instead.

Thanks

Kunlun-Zhu commented 2 years ago

Here is the direct link to the file: transformer

Achazwl commented 2 years ago

It is for modules that hasDistributedParameter as their direct child. And DistributedModule could be used as nn.Module and being submodule of other nn.Module.

Kunlun-Zhu commented 2 years ago

Thanks for the answer! So will the DistributedModel automatically set its nn.parameters into DistributedParameters, is that what you indicate? In the file 'transformer.py' will the program went wrong if we change the nn.Module into bmt.DistributedModule, or do you mean using nn.Module will be a safe case when we use model from model_center or when we set the parameters into DistributedParameter manually, Thanks.

Achazwl commented 2 years ago

So will the DistributedModel automatically set its nn.parameters into DistributedParameters

you should turn nn.parameters into DistributedParameters manually even with DistributedModule

In the file 'transformer.py' will the program went wrong if we change the nn.Module into bmt.DistributedModule

This should not went wrong. bmt.DistributedModule only take cares about the DistributedParameters directly in it. If there is no DistributedParameters in it, nn.Module and bmt.DistributedModule should work the same.

Kunlun-Zhu commented 2 years ago

Thanks, this solves my questions.