Migrate to pytorch-transformers

This pull request migrates us from pytorch-pretrained-bert to pytorch-transformers. pytorch-transforms implements a lot of features we will need for our next paper (e.g. access to attention weights, mixed-precision training), so this was important.

It also further decouples Sabers PyTorch models from Saber. For example, we now have two models, each of which subclasses pytorch_transformers.BertPreTrainedModel

from pytorch_transformers import BertPreTrainedModel

class BertForTokenClassificationMultiTask(BertPreTrainedModel)
class BertForEntityAndRelationExtraction(BertPreTrainedModel)

these are almost fully decoupled from Saber, i.e., they could be used outside Saber as PyTorch modules (with the sole dependency on pytorch-transformers). With a little bit more work these models could stand on their own (maybe we should make a pull request to pytorch-transformers?)

Closes #159.

BaderLab / saber

Migrate to pytorch-transformers #162