I am trying to evaluate the embeddings generated by ProtMamba, and I have a few questions:
max_position_embeddings = 2048 in the pre-trained model, which prevents me from passing any amino acid sequence greater than this length. Since I only want the embedding of a single sequence, which may be greater than 2048 but certainly within the context length of the model, is there any way to bypass this restriction?
For now we decided to fix the maximum positional embedding to 2048, therefore each sequence shouldn't be longer than that. This obviously doesn't limit the context length of the model, ideally you can feed hundreds of sequences as input (if each of them is shorter than 2048 residues). We plan to increase the maximum allowed length in the next versions of the model.
I am trying to evaluate the embeddings generated by ProtMamba, and I have a few questions:
max_position_embeddings = 2048 in the pre-trained model, which prevents me from passing any amino acid sequence greater than this length. Since I only want the embedding of a single sequence, which may be greater than 2048 but certainly within the context length of the model, is there any way to bypass this restriction?
What is the meaning of vocab_size = 50277?
Thanks!