add mask for pad token - Githubissues

CERC-AAI / multimodal

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Apache License 2.0

8 stars 3 forks source link

Closed floatingbigcat closed 1 year ago

floatingbigcat commented 1 year ago

mask the pad token to 0.0 in the loss_mask.