GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)
Google builds a 600 billion parameter transformer to do massively multilingual, massive machine translation. Interestingly, the larger model scale does not c...
https://www.youtube.com/watch?v=1VdEw_mGjFk
video
오오오 이 비디오 너무 짱!