Closed youssef62 closed 1 year ago
Hello @youssef62, I believe it is exactly the same principle than in your previous opened issue #530 ...
Hello @youssef62, I believe it is exactly the same principle than in your previous opened issue #530 ...
I am not sure of that because i don't get integers as output this time.
I also still get the same results even after adding astype(np.float64)
to c1 , c2 (even with float features also).
By default, I would advise to set input structure and feature matrices as np.array with float types. To be sure that you do not have degenerated intermediate transport plans because of the type. Then, indeed it can be a matter of poor local optimum because of poor initialization of the barycenter.
Thank you for your answer and tips.
n = 5
g1 = nx.erdos_renyi_graph(n,0.5)
g2 = nx.erdos_renyi_graph(n,0.5)
c1 = nx.adjacency_matrix(g1).toarray().astype(np.float64)
c2 = nx.adjacency_matrix(g2).toarray().astype(np.float64)
init_c = np.random.random((n,n)).astype(np.float64)
f = np.random.random((2,n,2)).astype(np.float64)
_,bary1,log = fgw_barycenters( n , f, np.array([c1,c2]) , alpha = .5 , init_C= init_c , symmetric=True , log=True)
print(bary1)
print(log)
the ouput is
[[0.36 0.36 0.36 0.36 0.36]
[0.36 0.36 0.36 0.36 0.36]
[0.36 0.36 0.36 0.36 0.36]
[0.36 0.36 0.36 0.36 0.36]
[0.36 0.36 0.36 0.36 0.36]]
{'err_feature': [1.8001614649348003, 0.0], 'err_structure': [1.7216815043238423, 0.0], 'Ts_iter': `...`
for 'Ts_iter' I have 3 matrices with 0.04 in all the entries , for the transport map 'T' it is also a matrix with only 0.04. For 'Ms' i get
Ms: [array([[0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
[0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
[0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
[0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
[0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119]]), array([[0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
[0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
[0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
[0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
[0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472]])]
I don't think this is particularly useful to you. Can you provide me with a sample where this function works as expected for you ( multiple times in preference ) ? So I can test it locally. Thanks in advance
Hello @youssef62,
Thank you for your feedback. After doing my own experiments, better default settings for these functions should clearly be considered to avoid poor local optimum. i) we set by default the features of the FGW barycenter to zero making the first computations of FGW distances independent to the barycenter features. For your own experiments I advise you to set init_X different from 0, e.g matrix of ones or uniformly sampled in the range of obversed features. A good option used in papers, if affordable, would be to perform kmeans on concatenated input features with k being the number of nodes in the barycenter. ii) 'warmstart=False' clearly seem to have a negative impact of the solver. I advise you to set 'warmstart=True'. I will soon fix this in a PR.
Remark: you should not provide an asymmetric matrix init_C while setting 'symmetric=True'.
Best, Cédric
I like the ability to use kmeans init, we should do thator at least provide this as possible init!
I ran some more tests and this issue comes up much less in version 0.8.2 than in 0.9.1 ( approximately half the of the time as exposed to always , using the code above ).
We did some debugging of optimization around 0.9 so it should work better (maybe it staus stuck in initilization place more because of this), @cedricvincentcuaz we need to look into this please (in addition to the new init).
Hello @youssef62 and @rflamary,
So building on the previous examples, I compared in many settings the GW and FGW barycenter solvers in both versions 0.82 and 0.91. In all cases, there can be slight differences in solvers' results between both versions because of different rounding errors. This is because we removed some unnecessary computations in fgw and gw solvers after 0.9 that considerably increased speed. This is the only difference between both versions when compared with exactly the same hyper-parameters, affecting both gw and fgw barycenter solvers. And can randomly allows to avoid to be stuck at the 1st BCD iteration if a local optimum is provided at initialization.
potential Improvements:
Thanks @cedricvincentcuaz , I think your analysis is spot on. The way it is now is that init_C
is completely ignored because we update C
with T
and T
is initialised independently from init_C
.
In my tests , locally I updated T before C so the new T takes init_C into account. I put the following block :
if warmstartT:
T = [fused_gromov_wasserstein(
Ms[s], C, Cs[s], p, ps[s], loss_fun=loss_fun, alpha=alpha, armijo=armijo, symmetric=symmetric,
G0=T[s], max_iter=max_iter, tol_rel=1e-5, tol_abs=0., verbose=verbose, **kwargs) for s in range(S)]
else:
T = [fused_gromov_wasserstein(
Ms[s], C, Cs[s], p, ps[s], loss_fun=loss_fun, alpha=alpha, armijo=armijo, symmetric=symmetric,
G0=None, max_iter=max_iter, tol_rel=1e-5, tol_abs=0., verbose=verbose, **kwargs) for s in range(S)]
Before this one :
if not fixed_structure:
T_temp = [t.T for t in T]
if loss_fun == 'square_loss':
C = update_square_loss(p, lambdas, T_temp, Cs)
elif loss_fun == 'kl_loss':
C = update_kl_loss(p, lambdas, T_temp, Cs)
And then I did some experiments and the bug seems to be gone now. I think it makes sense to do that for features too.
The PR has been merged, I'm closing the issue, thanks to you both.
Describe the bug
fgw_barycenters function often ( more than half of my tests) returns a matrix whose entries are the same. Is this considered to be the normal behavior ? In the case of the application on graphs, I always get a complete graph with self loops.
Screenshots
Code sample
Expected behavior
To have a non homogeneous matrix more often.
Environment (please complete the following information):
pip
,conda
):pipOutput of the following code snippet:
macOS-12.5.1-arm64-arm-64bit Python 3.11.5 (main, Aug 24 2023, 15:09:32) [Clang 14.0.0 (clang-1400.0.29.202)] NumPy 1.26.0 SciPy 1.11.3 POT 0.9.1
Additional context
I tried to test it with different values of alpha , even 1 which completely ignores the feature. I also tested with different features (not only all ones) and I still have the same issue. I suppose this has to do with the fact that it can converge to a local minima but I don't think it is supposed to happen this often.
I did not really know if this should have been reported as an issue or as a discussion post.