Closed Roger-GOAT closed 3 years ago
@WWXkenmo Hi, sorry for bothering you. Looking forward to your reply. Thanks a lot!
Sorry, I am too busy these time coz I am preparing an examination. After check your code, here is my responses and suggestions: 1). ICAnet uses random matrix theory to estimate the number of independent components it need to calculate. In most cases, the estimated dimensions would not be bigger than 50, unless you have a large sample size or you didn't perform variable gene selection in pre-processing step, so I perform truncted SVD to find the eigenvalues, and set the default dimension is 50. To fix this warning, you could check whether you have done variable gene selection (the preprocessing steps need to follow the code in "Integrating Multiple Single-Cell RNA-seq Dataset", the only thing you need to change just replace ICAnet as ICAnetTF). or you could run the updated code "ICAcomputing_RMT.R", and set the RMT.default = FALSE, and svd.max = 300 or a larger number. 2). You could use any kind of annotation file or feather file which could be downloaded from https://resources.aertslab.org/cistarget/ , just make sure the feather file is processed by the TF_Net_Generate function and annotation file is loaded. and set as the inputs of RunICAnetTF.
For the more questiones, please contact with me with no hesitate, I gonna response within a week.
K.
Also, I need to mention that the tutorial "Using ICAnetTF to identify TF-regulons in Single-Cell RNA-seq Dataset" just work for single batch, that means when you have multiple-batches dataset, you need to perform CrossBatchGrouping before running RunICAnet or RunICAnetTF.
K.
@WWXkenmo Thank you for your patience! I try this way:
> Motif_Net <- TF_Net_Generate("~/result/scenic/cisTarget_databases/mm9-500bp-upstream-7species.mc9nr.feather",cutoff=1)
Loading TF,motif annotation dataset...
Trimming the interaction...
Generating TF-Gene-Net...
Done
I set your new function (ICAcomputing_RMT.R) as ICAcomputingN.
> Ica.pancreas <- ICAcomputingN(pancreas,ICA.type="JADE",RMT.default = FALSE,svd.max = 300,two.stage=FALSE)
[1] "batch 1 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
|===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
Loading required package: coop
Loading required package: rARPACK
[1] "RMT estimate 60 expression programm"
[1] "batch 2 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
|===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 62 expression programm"
[1] "batch 3 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
|===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 59 expression programm"
[1] "batch 4 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
|===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 59 expression programm"
And then
> ica.pooling <- Ica.pancreas$ica.pooling
> ica.pooling <-CrossBatchGrouping(ica.pooling,k.max = (ncol(ica.pooling) - 1),
+ plot = TRUE,cor = "pearson",W.top = 2.5,filtering = TRUE,
+ threshold = 30, Unique.Preservation = TRUE)
**_Error in hclust(d, method = method) : must have n >= 2 objects to cluster_**
I SKIPPED this step and keep going
> mc9nr = read.table("./result/scenic/cisTarget_databases/motifs-v9-nr.mgi-m0.001-o0.0.tbl",
+ header=TRUE,sep=",")
> pancreas <- RunICAnetTF(pancreas, Ica.pancreas$ica.pooling, W.top.TFs=3, W.top.genes=2.5,aucMaxRank=600,
+ Motif_Net=Motif_Net,TF_motif_annot = mc9nr)
Running on TF-Gene Network
Running on the 1st component
Running on the 2st component
Running on the 3st component
...
...
Running on the 239st component
Running on the 240st component[1] "num:0"
Quantiles for the number of genes detected by cell:
(Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
min 1% 5% 10% 50% 100%
85 126 141 149 178 236
Using 6 cores.
**_Error in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores, :
geneSets should be a named list._**
It is the same error as the previous one.
Just finish my examination. Let me check your code and I gonna anwser your issues soonsooner or later !
Sorry for waiting long time I carefully check you code, and run a toy example on my dataset, here is my code.
> Ica.brain <- ICAcomputingN(Brain,ICA.type="JADE",RMT.default = FALSE,svd.max = 300,two.stage=FALSE)
[1] "batch 1 Indepdent Component Analysis"
[1] "emmm...centering..."
Centering data matrix
|======================================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 20 expression programm"
[1] "batch 2 Indepdent Component Analysis"
[1] "emmm...centering..."
Centering data matrix
|======================================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 23 expression programm"
For the first bug in 'CrossBatchGrouping', you could run the simplest code like this, and it still work
> ica.pooling <- Ica.brain$ica.pooling
> ica.pooling <-CrossBatchGrouping(ica.pooling,cor = "pearson", W.top = 2.5)
Identify5 patterns
For the RunICAnetTF step, I should remind that you could use the annotation dataset provide by RcisTarget
> library(RcisTarget)
> data(motifAnnotations_mgi_v8)
> Motif_Net <- TF_Net_Generate("mm9-500bp-upstream-10species.mc8nr.feather",cutoff=1)
> Brain <- RunICAnetTF(Brain,ica.pooling,W.top.TFs=3,W.top.genes=2.5,aucMaxRank=600,Motif_Net=Motif_Net,TF_motif_annot=motifAnnotations_mgi_v8)
Also, if you want to build your own motif_annotation data from, make sure the format of the motif_annotation data frame is the same as the data frame provided by RcisTarget
> head(motifAnnotations_mgi_v8)
motif TF directAnnotation inferred_Orthology
1: bergman__Abd-B Hoxa10 FALSE TRUE
2: bergman__Abd-B Hoxa11 FALSE TRUE
3: bergman__Abd-B Hoxa13 FALSE TRUE
4: bergman__Abd-B Hoxa9 FALSE TRUE
5: bergman__Abd-B Hoxb13 FALSE TRUE
6: bergman__Abd-B Hoxb9 FALSE TRUE
inferred_MotifSimil annotationSource
1: FALSE inferredBy_Orthology
2: FALSE inferredBy_Orthology
3: FALSE inferredBy_Orthology
4: FALSE inferredBy_Orthology
5: FALSE inferredBy_Orthology
6: FALSE inferredBy_Orthology
description
1: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 18%) which is directly annotated for motif
2: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 25%) which is directly annotated for motif
3: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 15%) which is directly annotated for motif
4: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 23%) which is directly annotated for motif
5: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 22%) which is directly annotated for motif
6: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 27%) which is directly annotated for motif
> class(motifAnnotations_mgi_v8)
[1] "data.table" "data.frame"
Hope it's helpful.
K.
Thank you @WWXkenmo , hope your examination is doing well! It is the same error.
> ica.pooling <- Ica.pancreas$ica.pooling
> ica.pooling <- CrossBatchGrouping(ica.pooling,cor = "pearson", W.top = 2.5) #
Error in hclust(d, method = method) : must have n >= 2 objects to cluster
library(RcisTarget)
data(motifAnnotations_mgi_v8)
> pancreas <- RunICAnetTF(pancreas, Ica.pancreas$ica.pooling, W.top.TFs=3, W.top.genes=2.5,aucMaxRank=600,
+ Motif_Net=Motif_Net,TF_motif_annot = motifAnnotations_mgi_v8)
Running on TF-Gene Network
Running on the 1st component
...
...
Running on the 239st component
Running on the 240st component[1] "num:0"
Using 6 cores.
Error in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores, :
geneSets should be a named list.
In addition: Warning message:
In .AUCell_buildRankings(exprMat = exprMat, plotStats = plotStats, :
There has been an error in plotGeneCount() [Message: figure margins too large]. Proceeding to calculate the rankings...
Is something wrong in my data?
Hi @WWXkenmo , sorry for the bother, could you check where the errors come from?
Error in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores, :
geneSets should be a named list.
Sorry for delayed response I checked the code and my data repeatly and I could not find any errors... Here is my suggestions: could you provide the dimensions and row names of your Ica.pancreas object? Also, make sure your DefaultAssay(pancreas) is RNA or SCT
K.
@WWXkenmo thank you very much! The problem solved! It maybe I did not set DefaultAssay(pancreas) is RNA or SCT
Dear team, sorry for bothering you, another question. I used the pancreas object from "Integrate analysis of pancreatic islet scRNA-seq dataset". Everything is OK in that process. However, running RunICAnetTF gets an error.
When I set TF_motif_annot to NULL, and get below:
Q1:How to fix the problem? Q2:warning! the number of significant component is too large[1] "RMT estimate 50 expression programm"(does this matter?) Q3:Can I use "mm10refseq-r80500bp_up_and_100bp_down_tss.mc9nr.feather"? thank you and best!