earmingol / cell2cell

User-friendly tool to infer cell-cell interactions and communication from gene expression of interacting proteins
BSD 3-Clause "New" or "Revised" License
55 stars 12 forks source link

tensor.compute_tensor_factorization result is highly dependent on random_state #13

Closed deepcompbio closed 1 year ago

deepcompbio commented 2 years ago

Hello,

I'm running your GPU-versoin Tensor_cell2cell demo code and find that the result (plot of factors) is highly dependent on the random_state in tensor.compute_tensor_factorization, when the init is set to 'random'. When I set init to 'svd', it runs out of memory. Could you advise on how to generate a converged result, if it is possible? Many thanks.

tensor.compute_tensor_factorization(rank=15, init='random',

init='svd', # require 1.27 TiB memory

                                random_state=888)
earmingol commented 2 years ago

Hi!

Thanks for the question. Unfortunately tensor decompositions, similar to NMF, are non-deterministic methods and depending on the randomness of the initialization (random_state) can output different results. Also, the tensor decomposition behind Tensor-cell2cell outputs permutable factors, so you can get very similar results but with different order of factors too. To deal with this, a way is running the decomposition multiple times (changing the runs parameter), and improving the convergence parameters (changing the tol and n_iter_max parameters). For example, this command performs a more robust decomposition than the one in the demo:

tensor.compute_tensor_factorization(rank=tensor.rank, # Rank obtained from the elbow analysis. Could use any other
                                    init='svd', # or random if you get memory error
                                    random_state=888, # this is to make sure that you get the same result every time
                                    runs=100, tol=1e-8, n_iter_max=500 # These three parameters to ensure more stable results)

Also, as indicated in this paper (third paragraph of the section Choosing the Number of Components), increasing the number of factors makes the result to be more variable with different initializations (random_states). To evaluate how stable your results are, you can run the decomposition multiple times with different random_states and evaluate their similarity in a pairwise manner with the function cell2cell.tensor.correlation_index(). Then with those results, you can compute the mean and std across all pairwise comparisons to have an idea of how different/stable the results are.

Anyway, we are constantly working on improving the robustness of the method, so it could be that in coming versions we implement new ways to perform the decomposition.

I hope this helps.

deepcompbio commented 2 years ago

Hi Erick,

Your reply is very helpful. Thanks a lot.

By increasing the runs to 10 and changing the tol to 1e-8 and n_iter_max to 500, the results from different random_states became highly similar at the rank (the number of factors) of 8.

Further increasing the rank requires higher number of runs to get similar factors.

And indeed the order of the similar factors may change with different random_states.

Thanks again!

earmingol commented 1 year ago

@deepcompbio I'm happy to hear that this was useful for you :)