chengchingwen / Transformers.jl

Julia Implementation of Transformer models
MIT License
523 stars 74 forks source link

Added DistilBert implementation and Two Examples of Fill Mask with the models DistilBert and Roberta #197

Open deveshjawla opened 7 hours ago

deveshjawla commented 7 hours ago

Hi Peter,

I have implemented the DistilBert model and it is passing all Validation tests.

Although I have noticed Roberta is failing many tests due to : Expression: maximum(diff) < max_error Evaluated: 1.0863217f0 < 0.1.

Others models such as bloom, and Bert are passing.