How is BERT able to do zero shot transfer on XNLI dataset?

google-research / bert

TensorFlow code and pre-trained models for BERT

Apache License 2.0

38.28k stars 9.62k forks source link

So without any response maybe we can start coming with our own theories 😄

The only transfer mechanism I can identify comes from the sentpiece tokenizer. In cases where words in different languages with similar meaning share a common fragment, the representation learnt for that fragment in one language should be transferable to another language.

This is probably not the whole story though, since e.g the chinese sentpiece vocabulary probably doesn't have a huge overlap with other languages 🤔

google-research / bert

How is BERT able to do zero shot transfer on XNLI dataset? #457