Open abhidg opened 1 week ago
Sketch
from l2g import make_patch_graph, DataLoader, make_embedding, align_embeddings
# TODO: see what other graph embedding libraries use and try to be compatible
# L2Gv2 should be able to work with any embedding
from l2g.embeddings import VGAEEmbedding
# Local2Global is the old algorithm, ManifoldOptimizer the new one
from l2g.align import Local2Global, ManifoldOptimizer
# Load data
ds = DataLoader('l2gv2/nas') # loads from web (HuggingFace?)
P = make_patch_graph(ds, patch_identifier: str | V -> str)
vgae = VGAEEmbedding(**kwargs)
# Create embeddings, can use trivial parallelism here (multiprocessing.Pool)
embs: dict[str, np.array] = make_embedding(vgae, P) # calls emb.fit_transform(P[i]) for patch node i
# ^do node and edge embeddings need to be disambiguated?
# Alignment
aligner = ManifoldOptimizer()
# .fit() could generate the alignment criteria (scaling, orthogonal transformations and translation)
# whereas .fit_transform() applies it. Not clear whether keeping them separate makes sense.
X = aligner.fit_transform(embs) # X is xarray with node labels
Need to consider how much of this is portable to large graphs (perhaps by using dask and xarray) - should the use of multiprocessors / GPU / cluster be transparent to user which adds complexity or we handle that ourselves (such as using CPU for toy datasets), allowing the user to override as necessary.
An end user API should have at least the following points:
Further processing of embeddings, such as using them for classification is out of scope for this issue.