Open jamesnemesh opened 3 years ago
Hi @jamesnemesh ,
thanks for the interest in the analysis! I doubt that processed data will be available, but you can find all the pre processing done here: https://github.com/giovp/latent_factors_autoimmune/tree/master/src/preprocessing You'll notice its just standard SingleCellExperiment normalization+clustering.
I'm guessing you're actually regressing against the median latent factor scores for the cell label (or similar), but having the data structure you're loading in would let me understand your methods far more completely.
yes pretty much. Let me be more specific:
y
is the pathway activity score, and cluster
is the cluster label (handled as categorical internally). y
is not just one latent factors, but the aggregated medians of n of them, that we find by clustering the loadings. In figure 2 first cartoon we make this clear: several factors since share correlating weights, are median aggregated in a single "factor", which we call the pathway activity. The clustering steps of the factors in pathway activity was a bit heuristic, see https://github.com/giovp/latent_factors_autoimmune/blob/e09adf98afc5f1323bf67457b041acc840e74f23/src/assignLoadings/assignLoadings.R#L65 and https://github.com/giovp/latent_factors_autoimmune/blob/e09adf98afc5f1323bf67457b041acc840e74f23/src/assignLoadings/utils.R#L51 I think there could be better way to do it.I should mention that similar ideas have been explored by https://elifesciences.org/articles/43803 where they also adopted a similar aggregation strategy (although across iterations and not across factors).
Hope this is clear, happy to answer any other question!
Best, Giovanni
That's super helpful, thank you for getting back to me so quickly! OK, it really was what described in the paper - using categorical labels as predictors of the pathway activity, which for some reason I thought was "too simple", but makes sense. The clarification is great, and that additional reference is appreciated!
no problem at all, happy to help! indeed it's a very simple approach (maybe too simple?). I'd argue that since it boils down to just regression against the pathway activity, more powerful ideas revolving around GLMs could be used e.g. including additional covariates, or likelihoods etc.
Hi! Excellent paper with very thoughtful consideration of how to leverage latent factor analysis to understand how factors map onto biological pathways.
Our lab is very interested in reproducing some of your methodology. The code as given is very helpful in understanding some of the more fine grained details from the manuscript, but at times the R code is harder to interpret because you're reading in RDS serialized data that other people can't see the structure of, making the code significantly harder to read. For example, if I wanted to better understand assignLoadings.R, having the files in RA_pipeline would make life much easier to simply debug through your code to understand the section of the methods that says "each pathway activity was set as the response variable in a regression setting where the cluster labels function as the predictor". I'm guessing you're actually regressing against the median latent factor scores for the cell label (or similar), but having the data structure you're loading in would let me understand your methods far more completely.
Would it be possible to release some of the data that's loaded in by the scripts, at least in cases where the processed data was generated by you, not the primary data you downloaded from other labs (which of course, I'd expect I'd download myself if I want to reproduce that part of the analysis.)
Thanks for your attention.