Check and fix multicollinearity for association analysis and gRNA assignment

Katsevich-Lab / sceptre

An R package for single-cell CRISPR screen data analysis emphasizing statistical rigor, massive scalability, and ease of use.

https://katsevich-lab.github.io/sceptre/

GNU General Public License v3.0

26 stars 8 forks source link

Check and fix multicollinearity for association analysis and gRNA assignment #146

Open nhu-github opened 2 months ago

nhu-github commented 2 months ago

> sceptre_object <- set_analysis_parameters(
+   sceptre_object = sceptre_object,
+   discovery_pairs = discovery_pairs,
+   positive_control_pairs = positive_control_pairs,
+   side = "right",
+   resampling_mechanism = "permutations"
+ )
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Tried many ways but still cannot correct this issue. I am thinking if I missed some sutle change.

The sample data works while my own data does not. So I am thinking if any data format issue related.

nhu-github commented 2 months ago

ok, formula_object added kill this issue

ekatsevi commented 2 months ago

Could you elaborate on what formula_object helped you resolve your issue?

nhu-github commented 1 month ago

I ran a nextflow pipeline using the default data from sceptredata . All the code worked well until I put in my own dataset. Then I defined formula_object=formula(~ log(response_n_nonzero) + log(response_n_umis) + log(grna_n_nonzero) + log(grna_n_umis) + response_p_mito + batch) that I found from the tutorial helped to eliminate the error. Then the same problem occurred when I ran assign_grna using the mixture method for my high MOI data, but the error was also eliminated by the same formula_object. So my question is: should I use the same formula for grna_assignment_formula to run assign_grna?

timothy-barry commented 1 month ago

Hi,

I am not totally sure I understand. Is the issue that you passed a particular formula_object to grna_assignment_formula and you encountered a bug?

nhu-github commented 1 month ago

No, the problem was with the default settings on my own data (without passing formula_object). After I passed formula_object, the code went well.

Saranya-Balachandran commented 2 weeks ago

Dear Timothy, I am facing the same issue stated above, I have set formula object using set analysis parameters, and on printing the object the formula appears, but when assigning grna's, it only works if I provide the below command

sceptre_object_highmoi_mixture <- assign_grnas( sceptre_object = sceptre_object, method = "mixture", parallel = TRUE, formula_object = formula(formula) ) the formula is exactly the same that I provided for set_analysis_parameters.

Why is this a requirement?

ekatsevi commented 1 week ago

Hi Saranya, thanks for bringing this issue to our attention. By default, sceptre does not use the same formula object for association testing and for gRNA assignment. The reason for this is that there are some extra covariates that are usually good to include for gRNA assignment, like grna_n_umis and grna_n_nonzero. By default, these extra covariates are also included when running gRNA assignment. However, the fact that you are getting an error when doing so suggests that, on your data, the inclusion of these additional covariates is causing perfect multicollinearity. To resolve this issue, we'll need to implement an update that checks for this. In the meantime, I recommend you continue running the code by re-specifying the formula during gRNA assignment.