issues
search
CERC-AAI
/
multimodal
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Apache License 2.0
8
stars
3
forks
source link
add mask for pad token
#14
Closed
floatingbigcat
closed
1 year ago
floatingbigcat
commented
1 year ago
mask the pad token to 0.0 in the loss_mask.
mask the pad token to 0.0 in the loss_mask.