Remove / move internal transformer implementations

brendan-ai2 commented 5 years ago

We currently have a handful of transformer implementations. At least the following:

https://github.com/allenai/allennlp/blob/master/allennlp/modules/seq2seq_encoders/multi_head_self_attention.py https://github.com/allenai/allennlp/blob/master/allennlp/modules/seq2seq_encoders/bidirectional_language_model_transformer.py#L139 https://github.com/allenai/allennlp/blob/master/allennlp/modules/openai_transformer.py

We should remove these and rely on an external implementation. Perhaps, Pytorch's nn.Transformer. This will (very likely) be a breaking change for our existing models.

schmmd commented 5 years ago

@matt-peters during issue review it came up that you might propose standardizing on the openai transformer (assuming experiments work out). Is that accurate?

matt-peters commented 5 years ago

Ideally we could just use another fully featured pytorch transformer implementation (e.g. https://github.com/pytorch/fairseq or possibly https://github.com/huggingface/pytorch-pretrained-BERT) and not have to worry about maintaining the code in allennlp. fairseq would also give us optimized CNNs and fp16 in addition to self-attention.

nelson-liu commented 5 years ago

FWIW, looks like the new PyTorch 1.1 includes a torch.nn.MultiheadAttention

MaksymDel commented 5 years ago

Moreover, this module was landed as a part of a bigger Transformer PR

zhangguanheng66 commented 5 years ago

Thanks @maxdel, we have put together a PR for Transformer in PyTorch. Feel free to comment.

The transformer module is supported by torch.nn.MultiheadAttention. We have a plan to improve the efficiency of this operator in the near future. @cpuhrsch.

cpuhrsch commented 5 years ago

Hey all, please let us know how we can help you guys deduplicate some of your work and better support this library overall!

We're eager to provide a good set of tools or a full implementation of what you need for your Transformers.

Also, if you have more general feature requests, we'd be eager to hear about them as well.

sai-prasanna commented 5 years ago

@cpuhrsch Need some help figuring out the decoder abstraction - https://github.com/allenai/allennlp/pull/2517 .

nelson-liu commented 5 years ago

The new version of pytorch includes a nn.Transformer module

zhangguanheng66 commented 5 years ago

Hello everyone, I just want to check if we could provide any help to apply nn.Transformer module. We want to better support AllenNLP and avoid the duplicate work. We could implement more features if you feel necessary to better serve this repo.

brendan-ai2 commented 4 years ago

Updated description to reflect that we don't intend to maintain our own transformer implementation anymore.

zhangguanheng66 commented 4 years ago

If you find any problems and difficulties when mitigating to pytorch transformer/multiheadattention, please ping me @zhangguaheng66 on pytorch Github. We are happy to provide some hands-on help.

schmmd commented 4 years ago

@zhangguanheng66 fantastic and thanks for reaching back out. I'm pretty sure we'll be doing this in the next two months.

matt-gardner commented 4 years ago

What we should do: remove all of our internal implementations of multi-headed attention / self attention, in favor of huggingface and pytorch implementations. Some of our old models depend on this code, though. As we split out those models into their own repos, our internal implementations should go with them, when it makes sense. If we have any implementations that are not used in one of the models that we are putting into their own repo, they should just be removed.

zhangguanheng66 commented 4 years ago

We opened an issue to keep a record of change requests for MHA and transformer. If you have some implementations what are worth landing in pytorch repo, feel free to send requests there. https://github.com/pytorch/pytorch/issues/32590