RoBERTa ANCE FirstP forces lowercasing

Description

I was experimenting with the sentence-transformers/msmarco-roberta-base-ance-firstp model and observed some discrepancies between the outputs of the tokenizer depending on how the tokenizer was called. See example below:

from sentence_transformers import SentenceTransformer

# load model
roberta_ance = SentenceTransformer("sentence-transformers/msmarco-roberta-base-ance-firstp")

print(roberta_ance.tokenize(["What is this?"]))
# >>> {'input_ids': tensor([[0, 12196, 16, 42, 116, 2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}
# decodes to "what is this?"

print(roberta_ance.tokenizer(["What is this?"]))
# >>> {'input_ids': [[0, 2264, 16, 42, 116, 2]], 'attention_mask': [[1, 1, 1, 1, 1, 1]]}
# decodes to "What is this?"

It appears that calling roberta_ance.tokenize forces lowercasing compared to roberta_ance.tokenizer. I confirmed that this was not the case with the base RoBERTa model:

from sentence_transformers import SentenceTransformer

# load model
roberta = SentenceTransformer("roberta-base")

print(roberta.tokenize(["What is this?"]))
# >>> {'input_ids': tensor([[0, 2264, 16, 42, 116, 2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}
# decodes to "What is this?"

print(roberta.tokenizer(["What is this?"]))
# >>> {'input_ids': [[0, 2264, 16, 42, 116, 2]], 'attention_mask': [[1, 1, 1, 1, 1, 1]]}
# decodes to "What is this?"

Is this intended behaviour?

Environment

sentence-transformers==2.2.2

UKPLab / sentence-transformers

RoBERTa ANCE FirstP forces lowercasing #1831

Description

Environment