Closed david-arredondo closed 1 month ago
Hi David,
Thank you for pointing that out.
I ran some tests, and maybe it's the propagation of numerical errors that is responsible for that behavior. I ran the test with the model torch.bfloat16 and was able to reproduce what you saw. I tried again in torch.float32 and was not encountering the same error.
I haven't found any reference to this issue on the Mamba forum, so not completely sure why.
You can try loading the model using:
model = load_model(checkpoint,
model_class=MambaLMHeadModelwithPosids,
device=device,
dtype=torch.float32,
checkpoint_mixer=False
).eval()
instead of torch.bfloat16 and see if it also works better for you.
I am evaluating the use of the last vector in the last hidden layer as an embedding for a given input sequence.
I noticed that if I pass multiple sequences in a batch, I get a different embedding than if I pass them in one at a time.
For example:
will return different embeddings for the first sequence than: