Open player1321 opened 3 years ago
All tensors of the same operation must have the same placement to execute properly.
Why do you want to place weight and bias on different devices?
If you want to do pipeline parallel, you can put two operations on different devices. This can be done by putting one operation's inputs tensor on device set A, then putting another operation's inputs tensor on another device set B.
If you want to do pipeline parallel, you can refer to:
@strint thanks for your reply. Actually, I want to put the whole module on device {0: [1]} following your tutorial, but I don't know why the weight and bias are placed in different devices. There should be something wrong with the weight, since when I put the module on {0: [1]}, the weight is on {0: [0]}, and I also tried to put the module on {0: [0]}, the weight is on {0: [1]}, while the bias is on the right device. The code is modified from your hugging_face_competition baseline.
What about self.init_weights()
, is there some to_consistent operations on linear's weight ?
You can check all the to_consistent
operations on Module and Tensor.
There are no to_consistent operations in self.init_weights()
, will initialization operations change the device?
I find output_embeddings.weight = input_embeddings.weight
in self.init_weights()
, it should be the bug.
But output_embeddings.weight = output_embeddings.weight.to_consistent(placement=P1, sbp=BROADCAST)
seems not the right way to handle this, got TypeError:
TypeError: cannot assign '<class 'oneflow._oneflow_internal.Tensor'>' as parameter 'weight' (nn.Parameter or None expected)
Try this:
output_embeddings.weight = nn.Parameter(output_embeddings.weight.to_consistent(placement=P1, sbp=BROADCAST))
Thanks, it worked. But got another bug: `
Summary
to_consistent实现流水并行时同一op的不同参数被放到不同的GPU上导致无法运行。
Code to reproduce bug
` class BertForPreTraining(nn.Module):
`
System Information
python3 -m oneflow --doctor
): version: 0.5.0+cu110 cmake_build_type: Release rdma: True