OxfordRSE / L2Gv2

0 stars 0 forks source link

Start developing end-user API for temporal graphs #29

Open abhidg opened 1 week ago

abhidg commented 1 week ago

An end user API should have at least the following points:

Further processing of embeddings, such as using them for classification is out of scope for this issue.

abhidg commented 4 days ago

Sketch

from l2g import make_patch_graph, DataLoader, make_embedding, align_embeddings

# TODO: see what other graph embedding libraries use and try to be compatible
# L2Gv2 should be able to work with any embedding
from l2g.embeddings import VGAEEmbedding

# Local2Global is the old algorithm, ManifoldOptimizer the new one
from l2g.align import Local2Global, ManifoldOptimizer

# Load data
ds = DataLoader('l2gv2/nas')  # loads from web (HuggingFace?)

P = make_patch_graph(ds, patch_identifier: str | V -> str)
vgae = VGAEEmbedding(**kwargs)

# Create embeddings, can use trivial parallelism here (multiprocessing.Pool)
embs: dict[str, np.array] = make_embedding(vgae, P)  # calls emb.fit_transform(P[i]) for patch node i
# ^do node and edge embeddings need to be disambiguated?

# Alignment
aligner = ManifoldOptimizer()

# .fit() could generate the alignment criteria (scaling, orthogonal transformations and translation)
# whereas .fit_transform() applies it. Not clear whether keeping them separate makes sense.
X = aligner.fit_transform(embs)  # X is xarray with node labels

Need to consider how much of this is portable to large graphs (perhaps by using dask and xarray) - should the use of multiprocessors / GPU / cluster be transparent to user which adds complexity or we handle that ourselves (such as using CPU for toy datasets), allowing the user to override as necessary.