facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.3k stars 6.39k forks source link

roberta -- maximum sequence length for positionalembedding? #964

Closed rush86999 closed 5 years ago

rush86999 commented 5 years ago

looking at the code it seems 256 is the max sequence length for PositionalEmbedding with emb_dim being the dimension from the type of the roberta model. I just wanted to confirm this or is the value 512? I am using the positional embedding module from fairseq to include in my custom decoder for my custom transformer (attention only).

rush86999 commented 5 years ago

also on follow up if the maximum positional embeddings are based on sentence-level parameters, does this mean roberta is not optimized for a 2000 word document? In other words if someone wanted to use it for document-level analysis, such as abstractive summarization, it might not work?

ngoyal2707 commented 5 years ago

We set max_seq_len of TransformerSentenceEncoder with args.max_positions here which was set to 512 for roberta training.

For working with documents greater than 512 size, you can chunk the document and use roberta to encode each chunk. Depending on your application, you can either do simple avg pooling of the embeddings of each chunk or feed it to any other top module.