Open cowjen01 opened 2 years ago
It depends how close the new sample is on average to the first 1000 samples. If it's a nearest neighbor of many of the original samples, then the embedding may look a bit different.
A few things you can try:
eps
to the embed method (mde.embed(verbose=True, eps=1e-6)
for example)Anchored
constraint to pin the first 1000 samples to the original embedding.If you give me access to the data I can play around with your example when I have some free time.
Additionally, I see that the log contains the following line:
Feb 21 07:21:55 PM: Your dataset appears to contain duplicated items (rows); when embedding, you should typically have unique items.
Having duplicated is typically ill-advised (and can sometimes lead to unexpected behavior), since it doesn't really make sense in the context of the embedding problem. You don't need two representations of the same thing.
My code is following:
When I use the first 1,000 samples from the input matrix I get a very different results then using one sample more (1,001).
Here is the log:
And here the output embeddings:
Is this an expected behaviour? I thought adding one sample should not makes as much difference.
Thank you for helping me out!