PythonOT / POT

POT : Python Optimal Transport
https://PythonOT.github.io/
MIT License
2.43k stars 500 forks source link

fgw barycenter often returns matrix with same entries #547

Closed youssef62 closed 1 year ago

youssef62 commented 1 year ago

Describe the bug

fgw_barycenters function often ( more than half of my tests) returns a matrix whose entries are the same. Is this considered to be the normal behavior ? In the case of the application on graphs, I always get a complete graph with self loops.

Screenshots

image

Code sample


n = 50
g1 = nx.erdos_renyi_graph(n,0.5)
g2 = nx.erdos_renyi_graph(n,0.5)

c1 = nx.adjacency_matrix(g1).toarray()
c2 = nx.adjacency_matrix(g2).toarray()

_,bary1 = fgw_barycenters( n , [[[1]]*n]*n , [c1,c2] , alpha = 0.5)

print(bary1)

Expected behavior

To have a non homogeneous matrix more often.

Environment (please complete the following information):

Output of the following code snippet:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)

macOS-12.5.1-arm64-arm-64bit Python 3.11.5 (main, Aug 24 2023, 15:09:32) [Clang 14.0.0 (clang-1400.0.29.202)] NumPy 1.26.0 SciPy 1.11.3 POT 0.9.1

Additional context

I tried to test it with different values of alpha , even 1 which completely ignores the feature. I also tested with different features (not only all ones) and I still have the same issue. I suppose this has to do with the fact that it can converge to a local minima but I don't think it is supposed to happen this often.

I did not really know if this should have been reported as an issue or as a discussion post.

cedricvincentcuaz commented 1 year ago

Hello @youssef62, I believe it is exactly the same principle than in your previous opened issue #530 ...

youssef62 commented 1 year ago

Hello @youssef62, I believe it is exactly the same principle than in your previous opened issue #530 ...

I am not sure of that because i don't get integers as output this time. I also still get the same results even after adding astype(np.float64) to c1 , c2 (even with float features also).

cedricvincentcuaz commented 1 year ago

By default, I would advise to set input structure and feature matrices as np.array with float types. To be sure that you do not have degenerated intermediate transport plans because of the type. Then, indeed it can be a matter of poor local optimum because of poor initialization of the barycenter.

youssef62 commented 1 year ago

Thank you for your answer and tips.

n = 5
g1 = nx.erdos_renyi_graph(n,0.5)
g2 = nx.erdos_renyi_graph(n,0.5)

c1 = nx.adjacency_matrix(g1).toarray().astype(np.float64)
c2 = nx.adjacency_matrix(g2).toarray().astype(np.float64)

init_c = np.random.random((n,n)).astype(np.float64)

f = np.random.random((2,n,2)).astype(np.float64)

_,bary1,log = fgw_barycenters( n , f, np.array([c1,c2]) , alpha = .5 , init_C= init_c , symmetric=True , log=True)

print(bary1)
print(log)

the ouput is

[[0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]]
{'err_feature': [1.8001614649348003, 0.0], 'err_structure': [1.7216815043238423, 0.0], 'Ts_iter': `...`

for 'Ts_iter' I have 3 matrices with 0.04 in all the entries , for the transport map 'T' it is also a matrix with only 0.04. For 'Ms' i get

Ms: [array([[0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119]]), array([[0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472]])]

I don't think this is particularly useful to you. Can you provide me with a sample where this function works as expected for you ( multiple times in preference ) ? So I can test it locally. Thanks in advance

cedricvincentcuaz commented 1 year ago

Hello @youssef62,

Thank you for your feedback. After doing my own experiments, better default settings for these functions should clearly be considered to avoid poor local optimum. i) we set by default the features of the FGW barycenter to zero making the first computations of FGW distances independent to the barycenter features. For your own experiments I advise you to set init_X different from 0, e.g matrix of ones or uniformly sampled in the range of obversed features. A good option used in papers, if affordable, would be to perform kmeans on concatenated input features with k being the number of nodes in the barycenter. ii) 'warmstart=False' clearly seem to have a negative impact of the solver. I advise you to set 'warmstart=True'. I will soon fix this in a PR.

Remark: you should not provide an asymmetric matrix init_C while setting 'symmetric=True'.

Best, Cédric

rflamary commented 1 year ago

I like the ability to use kmeans init, we should do thator at least provide this as possible init!

youssef62 commented 1 year ago

I ran some more tests and this issue comes up much less in version 0.8.2 than in 0.9.1 ( approximately half the of the time as exposed to always , using the code above ).

rflamary commented 1 year ago

We did some debugging of optimization around 0.9 so it should work better (maybe it staus stuck in initilization place more because of this), @cedricvincentcuaz we need to look into this please (in addition to the new init).

cedricvincentcuaz commented 1 year ago

Hello @youssef62 and @rflamary,

So building on the previous examples, I compared in many settings the GW and FGW barycenter solvers in both versions 0.82 and 0.91. In all cases, there can be slight differences in solvers' results between both versions because of different rounding errors. This is because we removed some unnecessary computations in fgw and gw solvers after 0.9 that considerably increased speed. This is the only difference between both versions when compared with exactly the same hyper-parameters, affecting both gw and fgw barycenter solvers. And can randomly allows to avoid to be stuck at the 1st BCD iteration if a local optimum is provided at initialization.

potential Improvements:

youssef62 commented 1 year ago

Thanks @cedricvincentcuaz , I think your analysis is spot on. The way it is now is that init_C is completely ignored because we update C with T and T is initialised independently from init_C.

In my tests , locally I updated T before C so the new T takes init_C into account. I put the following block :

if warmstartT:
            T = [fused_gromov_wasserstein(
                Ms[s], C, Cs[s], p, ps[s], loss_fun=loss_fun, alpha=alpha, armijo=armijo, symmetric=symmetric,
                G0=T[s], max_iter=max_iter, tol_rel=1e-5, tol_abs=0., verbose=verbose, **kwargs) for s in range(S)]
        else:
            T = [fused_gromov_wasserstein(
                Ms[s], C, Cs[s], p, ps[s], loss_fun=loss_fun, alpha=alpha, armijo=armijo, symmetric=symmetric,
                G0=None, max_iter=max_iter, tol_rel=1e-5, tol_abs=0., verbose=verbose, **kwargs) for s in range(S)]

Before this one :

if not fixed_structure:
            T_temp = [t.T for t in T]
            if loss_fun == 'square_loss':
                C = update_square_loss(p, lambdas, T_temp, Cs)

            elif loss_fun == 'kl_loss':
                C = update_kl_loss(p, lambdas, T_temp, Cs)

And then I did some experiments and the bug seems to be gone now. I think it makes sense to do that for features too.

rflamary commented 1 year ago

The PR has been merged, I'm closing the issue, thanks to you both.