Confusion about attention mask for pretraining LongformerForMaskedLM

Hello, Due to difference in the documentation I have some confusion about the attention_mask input when pretraining LongformerForMaskedLM. According to https://huggingface.co/docs/transformers/model_doc/longformer#transformers.LongformerForMaskedLM the default is local attention (1 everywhere) when attention_mask = None. However, I get very different output logits in when I set:

model(input_ids, attention_mask = None)

vs when I set

model(input_ids, attention_mask=torch.ones(input_ids.shape)) # (i.e. ones everywhere for local attention).

So what is the correct way to pretrain if I want local attention?

allenai / longformer