Closed nshinton closed 3 years ago
SPECTER's underlying pre-trained Transformer model is SciBERT, which has the 512 token sequence length limit of BERT. At this time we only have pre-trained SPECTER starting from ScBERT. If you'd like to process longer inputs, you need to switch SciBERT to something like Longformer and retrain SPECTER again.
Hi, I'd like to change the max sequence length in order to embed larger documents.
Is there an extra argument I can give to
embed.py
to do this?I notice that
embed_papers_hf.py
has amax_length
parameter, but to use that script I need some way to specify that I don't have a GPU.Would appreciate any help with either of these scripts. :)