Open Adversarian opened 2 months ago
You can just use sklearn on un-normalized data
from sklearn.preprocessing import normalize
pca_train_sentences = nli_sentences[0:20000]
train_embeddings = model.encode(pca_train_sentences, convert_to_numpy=True)
normalized_embeddings = normalize(train_embeddings)
# Compute PCA on the train embeddings matrix
pca = PCA(n_components=new_dimension)
pca.fit(normalized_embeddings)
pca_comp = np.asarray(pca.components_)
or just use normalize_embeddings=True
for the model.encode
pca_train_sentences = nli_sentences[0:20000]
train_embeddings = model.encode(pca_train_sentences, convert_to_numpy=True, normalize_embeddings=True)
# Compute PCA on the train embeddings matrix
pca = PCA(n_components=new_dimension)
pca.fit(train_embeddings)
pca_comp = np.asarray(pca.components_)
@Jakobhenningjensen Thanks for your response!
The idea is to not have to use an external preprocessing and have the model perform end-to-end forward passes natively. Which is why we use a dense layer here filled with the PCA components instead of pca.transform
ing new inputs every time and that means your first suggestion isn't desirable as it requires sklearn
's normalize
every time during inference.
Secondly, LayerNorm
and torch.nn.functional.normalize
(which is what happens if you set normalize_embeddings=True
) do very different things. Since PCA is sensitive to data scale, it's a good practice to z-score standardize your data before fitting a PCA on top of it which is what LayerNorm
with elementwise_affine=False
does (or leaving elementwise_affine=True
and never using it in training mode). Meanwhile, torch.nn.functional.normalize
, simply divides each tensor by its $L_p$ norm to ensure that all tensors have unit length in $L_p$ space. I'm not sure if these two scenarios are mathematically equivalent from the point of view of PCA. I'm just pointing out the differences.
Hi, I would to begin by thanking you for your tremendous work on this library.
I had a question regarding your dimensionality_reduction.py example.
This is where you fit a PCA on top of the embeddings obtained by your model:
I was wondering if it would be better if first a
LayerNorm
withelementwise_affine=False
was appended to the model to ensure PCA receives standardized inputs. I've extendedsentence-transformer
'smodels.LayerNorm
so that it accepts additional args and kwargs forself.norm
and performed this experiment on my own dataset (which I'm not at liberty to share unfortunately), and it seems to be performing better than plain PCA with no LayerNorm.I was just wondering if somehow I got lucky with my particular data or if it's something to actually consider when performing dimensionality reduction.
Thanks in advance!