JasonWayne / deep-learning-essay

0 stars 0 forks source link

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity #247

Open JasonWayne opened 3 years ago

JasonWayne commented 3 years ago

One sentence

Paper

Code

Motivation

Novelties & Key contribution

Method

Experiments

Rethinking

Reference