const-ae / lemur

Latent Embedding Multivariate Regression
https://www.bioconductor.org/packages/lemur/
Other
80 stars 7 forks source link

collinearity error with control vs treatment test for multiple subjects #9

Open shobhitagrawal1 opened 11 months ago

shobhitagrawal1 commented 11 months ago

Hi, Really interesting work and really thankful for the general ease of use! The data I have has several subjects,each belonging to either control or treatment so the formula i am trying is lemur(sce, design = ~ subject + condition, n_embedding = 30, test_fraction = 0.5) however I am getting this error

Error in handle_design_parameter(design, data, col_data) : The model matrix seems degenerate ('matrix_rank(design_matrix) < ncol(design_matrix)'). Some columns are perfectly collinear. Did you maybe include the same coefficient twice?

Now my understanding is that the one-hot encoding for each of control and treatment is being declared as collinear, could you please tell me how one can run a typical multi-subject (assuming them to be biological replicates) two condition analysis ..

appreciate any help. thanking you shobhit

const-ae commented 11 months ago

Hi shobhit,

thank you :)

To fit a multi-subject two-condition analysis, set the design to ~ condition (i.e., drop the subject). This fits a single coefficient explaining the treatment effect for each gene.

If you notice that the subject effects are so strong that corresponding cells from different subjects are not aligned after calling align_by_grouping or align_harmony, you can call each method with the argument alignment_design = ~ condition + subject or alignment_design = ~ condition * subject to make the alignment more flexible. However, I advise to only fit different design and alignment_designs if absolutely necessary, as it complicates the interpretation of the effects.

Best, Constantin

shobhitagrawal1 commented 11 months ago

Dear Constantin, Thank you very much for the prompt reply, much appreciated. I was thinking of also using just condition for the fit and using align_by_grouping. The only hesitation was regarding the replicates the neighborhood analysis needs, will that still be possible without replicates being mentioned in the design matrix?

thank you once again shobhit

const-ae commented 11 months ago

Yes. The way the replicates are specified is through the group_by argument in find_de_neighborhoods. Here you would set group_by = vars(subject, condition).

shobhitagrawal1 commented 11 months ago

thanks once again! I will give it a try and get back to you.