How to handle non/weak-diagonal hat_B?

ravwojdyla commented 2 years ago

This issue is related to:

https://github.com/RoheLab/vsp/issues/36 describes potential issues with non/weak-diagonal $\hat{B}$, specifically related to connection between loadings and PCs. I would love to understand whether there's any "standard" or recommended way to handle this situation?

Just FYI I'm working with a python implementation, but I'm getting exactly the same results for $\hat{B}$, so it's ok to assume R if that's easier.

alexpghayes commented 2 years ago

It could just mean that your data doesn't have a clear mixture structure. A concrete data example will help a lot.

ravwojdyla commented 2 years ago

I'm sorry I can't share the data I'm working with, but maybe the issue is even visible with dummy Iris data?

library(datasets)
library(vsp)

data(iris)
x = scale(data.matrix(iris[1:4]), scale = TRUE)
b_hat = vsp(x, rank=3, degree_normalize = FALSE)$B

$\hat{B}$ heatmap

alexpghayes commented 2 years ago

It might help if you could mention how this B violates your expectations/the inference you are making from it that seems misleading?

ravwojdyla commented 2 years ago

@alexpghayes for context, I'm trying to evaluate VSP for 2 separate use cases:

"reduce" feature space from ~1000 features to some lower number $k $, would love if those new features had a reasonable meaning. Such that it's possible to interpret them but also downstream explanations of a model results in the new feature space was interpretable as well (as much as that's practical of course). I will write down my understanding so please correct me if I'm wrong but in this case the relationship between columns in $\hat{Z}$ and $\hat{Y}$ is important. Sparse $\hat{Y}$ will give me the "meaning" of the new feature (as the interpretation of the factors), the values in $\hat{Z}$ have clear correspondence to $\hat{Y}$ only if $\hat{B}$ is diagonal (or at least strong diagonal). Is my understanding correct? And if my understanding is correct, how would I handle this use case with non-diagonal $\hat{B}$?
graph soft clustering with meaningful/interpretable clusters

Those 2 use cases are separate and use different data. Doing some research I've stumbled upon your VSP paper, and I'm very excited about its potential. Would appreciate any hints, but also please let me know if I'm asking too many questions (I don't want to be a burden). I really appreciate your work.

alexpghayes commented 2 years ago

Questions are very welcome! So, I would start by noting that you can have clearly interpretable Z and Y and a non-diagonal B. Diagonal B indicates that you have factor structure, and non-diagonal B indicates that you have co-factor/bi-factor structure. Factor structure is much nicer to think about, but co-factor structure is also fine, just more complex. Do you like the Z and Y that you are getting?

Before trying to interpret B (or Y or Z) I also typically look at pairs plots (stats::pairs() in R) to check for radial streaking, as in the paper. If there isn't radial streaking, varimax might not help that much. The next thing I do is look at a screeplot to get an idea of the effective dimensionality of the data. Oftentimes it isn't clear what rank decomposition to use and so I try results for several different ranks and see if they tell a coherent story.

ravwojdyla commented 2 years ago

Questions are very welcome!

Thank you!

Before trying to interpret B (or Y or Z) I also typically look at pairs plots (stats::pairs() in R) to check for radial streaking, as in the paper. If there isn't radial streaking, varimax might not help that much. The next thing I do is look at a screeplot to get an idea of the effective dimensionality of the data

Right, I been doing exactly that (so that's reassuring to hear), btw https://github.com/karlrohe/spectral_workshop/raw/master/spectral_workshop.pdf was a great read and been following those tips (btw, do you know if there was a recording of this workshop?). Regarding radial streaking, it's not as clear as in the paper (but that's expected). In my case I "scale" the data (sklearn StandardScaler, which is R's scale) and then perform VSP (notable I do not "double center" the data):

$$\hat{Z}$$

$$\hat{Y}$$

Where the number 30 came from the screeplot (threshold at 1 eigenvalue, which I know might not be the most optimal way to find ${k}$). Assigning meaning to the factors is reasonable (I would expect (and like) slightly less factors to be honest), but when looked at

$$\hat{B} $$

and read the other issues, I got a bit confused about how to connect $\hat{Z}$ and $\hat{Y}$ if $\hat{B}$ is non-diagonal (which seems to be the case for me).

I know it's very hard to say without access to data, but I wonder if you have any immediate comments/hints about those graphs?

So in this case, say I assign some meaning to each of the 30 factors in $\hat{Y}$ , then I look at $\hat{Z}$ and take random sample out which has 30 PCs/"features", with non-diagonal (co-factor structure) how would I go about interpreting those values such that they are intuitive to some "end-user"?

Oftentimes it isn't clear what rank decomposition to use and so I try results for several different ranks and see if they tell a coherent story.

Do you decide this by how coherent the factors are or is there any systematic approach to this? I have also seen https://github.com/RoheLab/gdim

alexpghayes commented 2 years ago

So my advice here is to not worry about B too much, especially for rectangular data like yours. For rectangular data, vsp returns Z, which describes "topics" or "community memberships" of each node, and Y, which define the topics or communities themselves. In your case, Y looks great -- you have super clear radial streaking that is nicely aligned with basis vectors. This tells me roughly that you have clear topics in the data. Now Z is pretty blob-y in the plots above, which means that rows are loading on more than one topic at a time. B just describes how the Z and Y are related. Here B looks like roughly diagonal to me with some off-diagonal structure, but not very much in practice. I would interpret this to mean that rows loading on Z1 are predominantly members of topic Y1, but some topics are tightly tied to each other (i.e. membership in one topic is highly associated with membership in another block, when B[i, j] is large and positive), and some topic memberships are anti-correlated (i.e. when B[i, j] is large and negative).

Nothing here looks concerning to me -- I would be pretty happy to see this in my own data and would start investigating the meaning of Z and Y, perhaps playing around with different values of k for a sensitivity check.

ravwojdyla commented 2 years ago

@alexpghayes thank you so much, this is very useful. If you don't mind I would like to keep this issue open for a couple more days in case there's some further questions.

kennyjoseph commented 2 years ago

Hey @alexpghayes, I have two quick follow-ups as a more applied researcher trying to use vsp (apologies in advance for likely being incorrect on my understanding)...

From #55 , #36, and this discussion, I believe I understand that if the returned B does not look strong diagonal, then to get the "locations of Z on the factors", I need to multiply Z by B? EDIT: I guess this would be for the hard clustering (non-centered) perspective only? Otherwise I would assume I can interpret B, thinking about the LDA analogy, as a kind of topic correlation structure (a la the CTM).
For non-square data, I should be more worried about B/this problem?

alexpghayes commented 2 years ago

I believe I understand that if the returned B does not look strong diagonal, then to get the "locations of Z on the factors", I need to multiply Z by B? EDIT: I guess this would be for the hard clustering (non-centered) perspective only? Otherwise I would assume I can interpret B, thinking about the LDA analogy, as a kind of topic correlation structure (a la the CTM).

In general, interpreting B is difficult. If it looks weird, it's probably still fine, at least in my experience. We don't have a super great way to interpret B when B does not correspond the block-connection probability matrix in an SBM / mixture model, so it's not even clear how to tell when B is bad outside this context. The LDA analogy seems like an okay approach but I wouldn't lean into it too much.

For non-square data, I should be more worried about B/this problem?

No. The only difference is that, for non-square data, B roughly corresponds to a mixing matrix for a bipartite SBM rather than a unipartite SBM.

In general, I would not worry about B. I would look at Z and Y and make sure you like the factors that you are getting.

kennyjoseph commented 2 years ago

Thanks a lot for the reply @alexpghayes, much appreciated. It's a very cool method that we're enjoying using for a number of different projects!

RoheLab / vsp

How to handle non/weak-diagonal hat_B? #69

$\hat{B}$ heatmap