Open mdeff opened 3 years ago
I'm not sure whether that's in scope. But IIRC they use GSP / spectral graphs concept. Right? Should be pretty close to our features.compute_spectrogram()
. So it could go in that module (which would benefit from some better documentation). @nperraud what do you think?
@pulquero you could paste your implementation here.
It is essentially the characteristic function of the tig. This includes three ways to calculate it, but maybe you just want to go with the middle one, it's a reasonable balance between speed and memory.
def embedding(g, ts, nodes = None, method = None, **kwargs):
"""
GraphWave embedding.
ts = np.linspace(0, 100, 25)
Returns a tensor of (nodes, filters, chi)
"""
N = g.G.N
if nodes is None:
s = np.identity(N)
else:
s = np.zeros((N, len(nodes)))
for i, n in enumerate(nodes):
s[n][i] = 1.0
tig = g.filter(s[..., np.newaxis], **kwargs)
if s.shape[1] == 1: # single node
tig = tig[:, np.newaxis, :]
if tig.ndim == 2: # single filter
tig = tig[..., np.newaxis]
assert tig.shape == s.shape + (tig.shape[2],)
if method is None:
if N < 100:
method = 'kron'
elif N < 10000:
method = 'partial-kron'
else:
method = 'loop'
if method == 'kron':
# fully vectorized
tig_t_grid = np.kron(tig[..., np.newaxis], ts)
def chi(xt):
return np.mean(np.exp(xt*1j), axis=0)
return chi(tig_t_grid)
elif method == 'partial-kron':
# more memory efficient
def chi(xt):
return np.mean(np.exp(xt*1j), axis=0)
emb = np.empty((tig.shape[1], tig.shape[2], ts.shape[0]), dtype=complex)
for i in range(tig.shape[1]):
for j in range(tig.shape[2]):
tig_t_grid = np.kron(tig[:,i,j, np.newaxis], ts)
emb[i][j] = chi(tig_t_grid)
return emb
elif method == 'loop':
# every byte counts
def chi(x, t):
return np.mean(np.exp(x*t*1j), axis=0)
emb = np.empty((tig.shape[1], tig.shape[2], ts.shape[0]), dtype=complex)
for i in range(tig.shape[1]):
for j in range(tig.shape[2]):
for k, t in enumerate(ts):
emb[i][j][k] = chi(tig[:,i,j], t)
return emb
`
Thanks!
It is essentially the characteristic function of the tig.
That's good.
Waiting on @nperraud to weight in. Then we can see how to integrate it properly.
I'm interested to see how compute_spectrogram() compares. Any refs I can read? Looks like it should work in a similar way. Previously, I've looked at spectral graph wavelet signature (https://arxiv.org/abs/1705.06250), but this is only really suitable for equally-sized graphs - signature doesn't really transfer across arbitrary graphs. GraphWave should be more 'portable' in that sense, although it classifies individual nodes rather than graphs, but you can always take a bag-of-features approach like with the signature (essentially that is just built for the raw filter coefficients, GraphWave goes a little bit further by applying the characteristic function).
I too believe it should be quite similar. compute_spectrogram()
was developed by @nperraud. AFAIK it has only been published in his thesis (chapter 3).
The main difference between https://arxiv.org/abs/1710.10321 (at least the first version of the paper) and my thesis is that we use different way to aggregate the features. They make some histogram I believe, and I compute the L2 norm. The reason why I use the L2 norm is that I can compute the features for all the nodes very efficiently in that way. See my thesis (chapter 3) for more information. I never published this and nobody knows that I did that, also it was done in 2017. If you are interested into developing this, please do contact me privately.
Shall we consolidate compute_spectrogram()
and GraphWave in the PyGSP?
We probably should... But this would have to come with a nice demo/tutorial: how to compute node features. I could try to do a proposition soon.
Please also add a nodes
parameter to compute_spectrogram()
so it can be computed for subsets. (I use this to batch up nodes and distribute on spark).
Btw, I believe there is a 'technical' bug in compute_norm_tig. I think the norm should be over the 'nodes' (axis=0) not the 'signal'.
The main difference between https://arxiv.org/abs/1710.10321 (at least the first version of the paper) and my thesis is that we use different way to aggregate the features. They make some histogram I believe, and I compute the L2 norm. The reason why I use the L2 norm is that I can compute the features for all the nodes very efficiently in that way. See my thesis (chapter 3) for more information. I never published this and nobody knows that I did that, also it was done in 2017. If you are interested into developing this, please do contact me privately.
We should have written a short paper 5 years ago ...
(Heat) kernel centrality should be added too (norm of the tig), being very similar to the other two.
From @pulquero [https://github.com/epfl-lts2/pygsp/issues/94#issuecomment-803219578]: