Choices of upper_rank and tf_type

deepcompbio commented 2 years ago

Hi Erick,

The choice of upper_rank has a significant impact on the automatically determined rank. They go in the same direction. How would one make the choice of upper_rank in elbow analysis and if needed a manual rank for eventual factorization?

And for tf_type, there are four types. Except that 'non_negative_cp_hals' does not allow a mask, how would one choose among the rest three factorization methods?

Thanks.

earmingol commented 2 years ago

Hi!

Thanks for this new question, happy to see that you are getting familiar with our tool.

Regarding the elbow analysis, theoretically you can increase the upper_rank up to any number; however, when you run decompositions with large numbers of factors you will have a huge demand of memory, which could be a bottleneck and output an error.

For selecting the number of factors, try to use a realistic upper limit. We put just 25 since that's a large enough for us. However, if you are willing to handle even more –considering that the more factors you use, the more factors you have to interpret– you can increase the upper_rank. More factors will lead to a lower error, and since the elbow analysis is an automated way based con derivatives of the error curve, of course your number of selected factors will change. In this regard, you can always try different approaches to select the number of factors, even manually, depending on what your trade-off is. We are planning to implement an elbow analysis based on the similarity of decompositions instead of their error too, but it will be available at some point in the future.

About the tf_type parameter, for now this is experimental since we are trying new decomposition methods for other analysis with Tensor-cell2cell. We recommend using the default option since that's the one we introduced in the Tensor-cell2cell paper. Nevertheless, the option'non_negative_cp_hals' is a better algorithm in terms of converging into robust solutions, but with the disadvantage that only works without masks in this case (in other words, only for when building the tensor with the parameter how='inner').

I hope this is clear enough, otherwise let me know.

Erick

deepcompbio commented 2 years ago

Many thanks, Erick. Your reply is very helpful, as always.

The idea of elbow analysis based on the similarity of decompositions sounds promising. Actually I also tried to increase the rank a bit manually to compare the decompositions with those from auto rank. Looking forward to testing your new elbow analysis method in the future.

earmingol commented 2 years ago

Hi @deepcompbio

The elbow analysis based on similarity is now available in the v0.6.2 in this PR https://github.com/earmingol/cell2cell/pull/17.

The way to use it is to add the parameter metric='similarity' when running:

tensor.elbow_rank_selection(upper_rank=25,
                                           runs=10,
                                           init='svd',
                                           automatic_elbow=True,
                                           random_state=888,
                                          )

Also, if the curve looks odd, you can smooth it by passing the parameter smooth=True to the same function.

deepcompbio commented 2 years ago

Hi @earmingol

This is cool and fast. Many thanks for adding this functionality. I will give it a try.

On a relevant topic, I have been recently experimenting with different ranks in factorization, from a dozen to a few hundreds (until my GPU memory runs out). It seems that higher # of factors yields higher # of interesting ligand-receptor interactions (some of the interactions are known in literature for the particular disease I'm studying). Thus the question is how could one determine which rank is sufficient for factorization from the biological perspective? Thanks.

earmingol commented 1 year ago

That's something to expect, the more factors you use the better is the resolution of your results. As we explained in this post, tensor decomposition approximates the original data while capturing the most prominent patterns. In that same post, if you think of the decomposition of a picture (Fig. 1d-e), you should get a reconstructed picture that looks like as the original, and the resolution should improve as you increase the number of factors (in the example we used only 3 factors, but if we would have used 100 factors instead, the reconstructed image would look way more similar to the original). That said, there is a trade-off that you have to deal with, and it is how many factors you are willing to interpret vs how much resolution you want. The elbow analysis helps to have a decent trade-off of both, but it is not necessarily the most adequate way, and it's always up to you what criteria to use for selecting the number of factors.

Hope this helps!

deepcompbio commented 1 year ago

@earmingol Thanks for your helpful answer.

earmingol / cell2cell

Choices of upper_rank and tf_type #16