Open guilherme-marchezini opened 2 years ago
Have you tried to directly load the annoy instance? It could be done using something like this:
embedding = pacmap.PaCMAP() # initialize/load the saved pacmap instance
embedding.tree = load_annoy_tree() # your function that loads the annoy instance
Hello! I did tried what you suggested, and even completed the others attributes that the method required to run:
u = AnnoyIndex(0)
u.load('test.ann')
embedding.tree = u
embedding.xmin = emb_model.xmin
embedding.xmax = emb_model.xmax
embedding.xmean = emb_model.xmean
embedding.tsvd_transformer = emb_model.tsvd_transformer
embedding.pair_FP = emb_model.pair_FP
embedding.pair_MN = emb_model.pair_MN
embedding.pair_neighbors = emb_model.pair_neighbors
embedding.n_neighbors = emb_model.n_neighbors
embedding.transform(feature_matrix_c)
But I still get:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipykernel_26194/902635302.py in <module>
----> 1 embedding.transform(feature_matrix_c)
/opt/conda/lib/python3.9/site-packages/pacmap/pacmap.py in transform(self, X, basis, init, save_pairs)
932 self.apply_pca, self.verbose)
933 # Sample pairs
--> 934 self.pair_XP = generate_extra_pair_basis(basis, X,
935 self.n_neighbors,
936 self.tree,
/opt/conda/lib/python3.9/site-packages/pacmap/pacmap.py in generate_extra_pair_basis(basis, X, n_neighbors, tree, distance, verbose)
417
418 for i in range(npr):
--> 419 nbrs[i, :], knn_distances[i, :] = tree.get_nns_by_vector(
420 X[i, :], n_neighbors_extra, include_distances=True)
421
IndexError: Vector has wrong length (expected 0, got 17)
Seems like the problem is in your initialization of the AnnoyIndex. It seems like the number of dimensions you are using is 17, therefore for loading the annoy index, you should initialize it with u = AnnoyIndex(17)
instead of u = AnnoyIndex(0)
.
For some reason I cannot load the saved PaCMAP with index 17. I have to load with index 18, but this crashes the transform function. Idk if this is a PaCMAP problem or annoy index problem. But it would be nice to have a PaCMAP function to correctly save and load its models.
I see. We will work on that feature.
Hello. I'm trying to store the PaCMAP model in a db for further transformations. I tried to pickle, but the tree is an annoy.annoy object. Also tried to save the annoy.annoy object with embedding.tree.save('./annoy_object.ann'), this works but I cannot load, since creating the PaCMAP do not initialize the annoy.annoy tree. Is there a way to save/load PaCMAP object or tree? My main objective is to send it to a DB, so I can transform new incoming data in my clustering pipeline.
Thanks for your attention.