OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
560 stars 77 forks source link

[FeatureRequest]`bmt.OpTransformerBlockList` **DO NOT** support multiple return values of transformer block's forward propogation #91

Closed eggiter closed 1 year ago

eggiter commented 1 year ago

1. Currently bmt.OpTransformerBlockList can only handle the hidden states returned by transformer block.

  1. Recent released flash_atten implemented transformer block returns hidden_states as well as residual in order to fuse Dropout -> Add -> LN. Additionally, the above two will be passed to the next block as input;
  2. Above case seemed not be considered by our bmt.OpTransformerBlockList and cannot be properly handled by us.

2. Request to support the above case which returns multiple values by a transformer block.

eggiter commented 1 year ago

Close this issue since feature was supported by #92