cailab-tamu / scTenifoldKnk

R/MATLAB package to perform virtual knockout experiments on single-cell gene regulatory networks.
36 stars 6 forks source link

Using scTenifoldKnk for other organism than Human #13

Closed Rohit-Satyam closed 2 years ago

Rohit-Satyam commented 2 years ago

Hi Developers!!

I am working with plasmodium and I wish to knockout a gene and see its effect. But in Plasmodium the mitochondrial genes start with "milo" regex.

  1. Could you inculcate this option to provide the custom regex for mitochondrial genes (Feature request)!! Also do you suggest using it on non-human organism?
  2. Also could you add verbose parameter that prints what's step is running. It will help us know what calculations/ steps are running in the background. All I see right now are two progress bars.
  3. Also is there a Vignette that highlights do's and dont's while using this single function like what parameters are essential (eg: nc_lambda and others) and how to explore the result object to get into the insights of the knockdown?
Rohit-Satyam commented 2 years ago

I tried running it on my Wild type sample with 5446 genes and 3486 cells and got only 37 DE genes with p.adj <0.05 and LFC > 0.25 (all significant genes were Up regulated, zero downregulated). I then intersected the DEGs with the DEGs I obtained from Seurat workflow. 27 out of 37 genes overlapped with Up-regulated genes. 5 of 37 were Up-regulated in but in Seurat DE list they were downregulated. Overall the list of DEGs was small and downregulated genes were absent.

Any idea why that might be?? Note: The single cell knockout data that I currently have is of partial knockout of the gene i.e. where one out of two exons (the one that codes for functional domain) is knocked out.

dosorio commented 2 years ago

Dear @Rohit-Satyam, thank you very much for using scTenifoldKnk. I will try my best to solve your questions below.

  1. About adding a new regex pattern to compute the mitochondrial content in other organisms. I think this is doable, however, since the proportion of the mitochondrial content is tissue and cell-type specific (see: https://doi.org/10.1093/bioinformatics/btaa751), I don't think that the defined thresholds for humans and mice transfer straight to them. To avoid biases, I have added a new boolean argument qc that allows the users to skip the quality-control of the data. You can set it as FALSE and provide scTenifoldKnk with your already preprocessed data.
  2. Thanks for your suggestion. I am adding a verbose in the next CRAN version for sure.
  3. Thanks for your suggestion. The parameters and the parameters by default in each function are included in the documentation. You can access it in R using ?scTenifoldKnk.

About your comparisons with real data. I want to clarify that scTenifoldKnk at this moment does not provide directionality of the perturbation. The prediction is made based on the distance on the manifold after aligning the wt and perturbed networks. I am happy to see that you recovered a large overlap between your real and in-silico knockouts.

Please let me know if you have any other questions,

Best wishes,

Daniel

Rohit-Satyam commented 2 years ago

So Just confirming that the default parameters of scTenifoldKnk can be used for plasmodium or other related parasites?? I was hoping to know what arguments in the function must be altered when dealing with non-human scRNASeq data.

Follow-up question: In malaria, we have cells coming from different time-points of life cycle (Ring, Early Trophozoites, late TrophozoitesSchizont ), so will it affect the knockout results i.e. should the scRNASeq wild type sample be homogenous. If yes, then I should first segregate my wild type/normal cells according to stages (here I will have 4 separate single cell groups) and then run scTenifoldKnk separately on all 4 cell groups??

dosorio commented 2 years ago

Hi @Rohit-Satyam,

You are correct. The default parameters are adjusted to work well with sparse matrices from single-cell RNA-seq data and not just for human samples. If you want to evaluate the function associated with the gene at different stages of the parasite development. You should, in that case, split your dataset accordingly and use the submatrices as input for scTenifoldKnk.

Please let me know if you have any other questions,

Best wishes,

Daniel