Closed rush86999 closed 5 years ago
also on follow up if the maximum positional embeddings are based on sentence-level parameters, does this mean roberta is not optimized for a 2000 word document? In other words if someone wanted to use it for document-level analysis, such as abstractive summarization, it might not work?
We set max_seq_len
of TransformerSentenceEncoder
with args.max_positions
here which was set to 512
for roberta
training.
For working with documents greater than 512
size, you can chunk the document and use roberta to encode each chunk. Depending on your application, you can either do simple avg pooling of the embeddings of each chunk or feed it to any other top module.
looking at the code it seems 256 is the max sequence length for PositionalEmbedding with emb_dim being the dimension from the type of the roberta model. I just wanted to confirm this or is the value 512? I am using the positional embedding module from fairseq to include in my custom decoder for my custom transformer (attention only).