WWXkenmo / ICAnet

Independent Component Analysis decipher functional modules for better cell clustering and annotation
15 stars 1 forks source link

object 'motifAnnotations_mgi_v8' not found #4

Closed Roger-GOAT closed 3 years ago

Roger-GOAT commented 3 years ago

Dear team, sorry for bothering you, another question. I used the pancreas object from "Integrate analysis of pancreatic islet scRNA-seq dataset". Everything is OK in that process. However, running RunICAnetTF gets an error.

> Motif_Net <- TF_Net_Generate("~/result/scenic/cisTarget_databases/mm9-500bp-upstream-7species.mc9nr.feather",cutoff=1)
Loading TF,motif annotation dataset...
Trimming the interaction...
Generating TF-Gene-Net...
Done

> Ica.pancreas <- ICAcomputing(pancreas,ICA.type="JADE",RMT=TRUE,two.stage=FALSE)
[1] "batch 1 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
warning! the number of significant component is too large[1] "RMT estimate 50 expression programm"
[1] "batch 2 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
warning! the number of significant component is too large[1] "RMT estimate 50 expression programm"
[1] "batch 3 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
warning! the number of significant component is too large[1] "RMT estimate 50 expression programm"
[1] "batch 4 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
**warning! the number of significant component is too large[1] "RMT estimate 50 expression programm"**(_does this matter?_)

> pancreas  <- RunICAnetTF(pancreas,Ica.pancreas$ica.pooling, W.top.TFs=3, W.top.genes=2.5,aucMaxRank=600,Motif_Net=Motif_Net,TF_motif_annot=motifAnnotations_mgi_v8)
Running on TF-Gene NetworkError in table(TF_motif_annot$motif) : 
  object 'motifAnnotations_mgi_v8' not found

When I set TF_motif_annot to NULL, and get below:

> pancreas  <- RunICAnetTF(pancreas,Ica.pancreas$ica.pooling, W.top.TFs=3, W.top.genes=2.5,aucMaxRank=600,Motif_Net=Motif_Net)
Running on TF-Gene Network
Running on the 1st component
Running on the 2st component
Running on the 3st component
Running on the 4st component
Running on the 5st component
Running on the 6st component
Running on the 7st component
Running on the 8st component
Running on the 9st component
Running on the 10st component
Running on the 11st component
Running on the 12st component
Running on the 13st component
Running on the 14st component
Running on the 15st component
Running on the 16st component
Running on the 17st component
Running on the 18st component
Running on the 19st component
Running on the 20st component
Running on the 21st component
Running on the 22st component
Running on the 23st component
Running on the 24st component
Running on the 25st component
Running on the 26st component
Running on the 27st component
Running on the 28st component
Running on the 29st component
Running on the 30st component
Running on the 31st component
Running on the 32st component
Running on the 33st component
Running on the 34st component
Running on the 35st component
Running on the 36st component
Running on the 37st component
Running on the 38st component
Running on the 39st component
Running on the 40st component
Running on the 41st component
Running on the 42st component
Running on the 43st component
Running on the 44st component
Running on the 45st component
Running on the 46st component
Running on the 47st component
Running on the 48st component
Running on the 49st component
Running on the 50st component
Running on the 51st component
Running on the 52st component
Running on the 53st component
Running on the 54st component
Running on the 55st component
Running on the 56st component
Running on the 57st component
Running on the 58st component
Running on the 59st component
Running on the 60st component
Running on the 61st component
Running on the 62st component
Running on the 63st component
Running on the 64st component
Running on the 65st component
Running on the 66st component
Running on the 67st component
Running on the 68st component
Running on the 69st component
Running on the 70st component
Running on the 71st component
Running on the 72st component
Running on the 73st component
Running on the 74st component
Running on the 75st component
Running on the 76st component
Running on the 77st component
Running on the 78st component
Running on the 79st component
Running on the 80st component
Running on the 81st component
Running on the 82st component
Running on the 83st component
Running on the 84st component
Running on the 85st component
Running on the 86st component
Running on the 87st component
Running on the 88st component
Running on the 89st component
Running on the 90st component
Running on the 91st component
Running on the 92st component
Running on the 93st component
Running on the 94st component
Running on the 95st component
Running on the 96st component
Running on the 97st component
Running on the 98st component
Running on the 99st component
Running on the 100st component
Running on the 101st component
Running on the 102st component
Running on the 103st component
Running on the 104st component
Running on the 105st component
Running on the 106st component
Running on the 107st component
Running on the 108st component
Running on the 109st component
Running on the 110st component
Running on the 111st component
Running on the 112st component
Running on the 113st component
Running on the 114st component
Running on the 115st component
Running on the 116st component
Running on the 117st component
Running on the 118st component
Running on the 119st component
Running on the 120st component
Running on the 121st component
Running on the 122st component
Running on the 123st component
Running on the 124st component
Running on the 125st component
Running on the 126st component
Running on the 127st component
Running on the 128st component
Running on the 129st component
Running on the 130st component
Running on the 131st component
Running on the 132st component
Running on the 133st component
Running on the 134st component
Running on the 135st component
Running on the 136st component
Running on the 137st component
Running on the 138st component
Running on the 139st component
Running on the 140st component
Running on the 141st component
Running on the 142st component
Running on the 143st component
Running on the 144st component
Running on the 145st component
Running on the 146st component
Running on the 147st component
Running on the 148st component
Running on the 149st component
Running on the 150st component
Running on the 151st component
Running on the 152st component
Running on the 153st component
Running on the 154st component
Running on the 155st component
Running on the 156st component
Running on the 157st component
Running on the 158st component
Running on the 159st component
Running on the 160st component
Running on the 161st component
Running on the 162st component
Running on the 163st component
Running on the 164st component
Running on the 165st component
Running on the 166st component
Running on the 167st component
Running on the 168st component
Running on the 169st component
Running on the 170st component
Running on the 171st component
Running on the 172st component
Running on the 173st component
Running on the 174st component
Running on the 175st component
Running on the 176st component
Running on the 177st component
Running on the 178st component
Running on the 179st component
Running on the 180st component
Running on the 181st component
Running on the 182st component
Running on the 183st component
Running on the 184st component
Running on the 185st component
Running on the 186st component
Running on the 187st component
Running on the 188st component
Running on the 189st component
Running on the 190st component
Running on the 191st component
Running on the 192st component
Running on the 193st component
Running on the 194st component
Running on the 195st component
Running on the 196st component
Running on the 197st component
Running on the 198st component
Running on the 199st component
Running on the 200st component[1] "num:0"
Quantiles for the number of genes detected by cell: 
(Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
 min   1%   5%  10%  50% 100% 
  85  126  141  149  178  236 
Using 6 cores.
Error in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores,  : 
  geneSets should be a named list.

Q1:How to fix the problem? Q2:warning! the number of significant component is too large[1] "RMT estimate 50 expression programm"(does this matter?) Q3:Can I use "mm10refseq-r80500bp_up_and_100bp_down_tss.mc9nr.feather"? thank you and best!

Roger-GOAT commented 3 years ago

@WWXkenmo Hi, sorry for bothering you. Looking forward to your reply. Thanks a lot!

WWXkenmo commented 3 years ago

Sorry, I am too busy these time coz I am preparing an examination. After check your code, here is my responses and suggestions: 1). ICAnet uses random matrix theory to estimate the number of independent components it need to calculate. In most cases, the estimated dimensions would not be bigger than 50, unless you have a large sample size or you didn't perform variable gene selection in pre-processing step, so I perform truncted SVD to find the eigenvalues, and set the default dimension is 50. To fix this warning, you could check whether you have done variable gene selection (the preprocessing steps need to follow the code in "Integrating Multiple Single-Cell RNA-seq Dataset", the only thing you need to change just replace ICAnet as ICAnetTF). or you could run the updated code "ICAcomputing_RMT.R", and set the RMT.default = FALSE, and svd.max = 300 or a larger number. 2). You could use any kind of annotation file or feather file which could be downloaded from https://resources.aertslab.org/cistarget/ , just make sure the feather file is processed by the TF_Net_Generate function and annotation file is loaded. and set as the inputs of RunICAnetTF.

For the more questiones, please contact with me with no hesitate, I gonna response within a week.

K.

WWXkenmo commented 3 years ago

Also, I need to mention that the tutorial "Using ICAnetTF to identify TF-regulons in Single-Cell RNA-seq Dataset" just work for single batch, that means when you have multiple-batches dataset, you need to perform CrossBatchGrouping before running RunICAnet or RunICAnetTF.

K.

Roger-GOAT commented 3 years ago

@WWXkenmo Thank you for your patience! I try this way:

> Motif_Net <- TF_Net_Generate("~/result/scenic/cisTarget_databases/mm9-500bp-upstream-7species.mc9nr.feather",cutoff=1)
Loading TF,motif annotation dataset...
Trimming the interaction...
Generating TF-Gene-Net...
Done

I set your new function (ICAcomputing_RMT.R) as ICAcomputingN.

> Ica.pancreas <- ICAcomputingN(pancreas,ICA.type="JADE",RMT.default = FALSE,svd.max = 300,two.stage=FALSE)
[1] "batch 1 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
Loading required package: coop
Loading required package: rARPACK
[1] "RMT estimate 60 expression programm"
[1] "batch 2 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 62 expression programm"
[1] "batch 3 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 59 expression programm"
[1] "batch 4 Indepdent Component Analysis"
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.2_ to umappower02_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.4_ to umappower04_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from umap_power_0.7_ to umappower07_
[1] "emmm...centering..."
Centering data matrix
  |===================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 59 expression programm"

And then

> ica.pooling <- Ica.pancreas$ica.pooling
> ica.pooling <-CrossBatchGrouping(ica.pooling,k.max = (ncol(ica.pooling) - 1),
+   plot = TRUE,cor = "pearson",W.top = 2.5,filtering = TRUE,
+   threshold = 30,  Unique.Preservation = TRUE)
**_Error in hclust(d, method = method) : must have n >= 2 objects to cluster_**

I SKIPPED this step and keep going

> mc9nr = read.table("./result/scenic/cisTarget_databases/motifs-v9-nr.mgi-m0.001-o0.0.tbl",
+                    header=TRUE,sep=",")
> pancreas  <- RunICAnetTF(pancreas, Ica.pancreas$ica.pooling, W.top.TFs=3, W.top.genes=2.5,aucMaxRank=600, 
+                          Motif_Net=Motif_Net,TF_motif_annot = mc9nr)
Running on TF-Gene Network
Running on the 1st component
Running on the 2st component
Running on the 3st component
...
...
Running on the 239st component
Running on the 240st component[1] "num:0"
Quantiles for the number of genes detected by cell: 
(Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
 min   1%   5%  10%  50% 100% 
  85  126  141  149  178  236 
Using 6 cores.
**_Error in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores,  : 
  geneSets should be a named list._**

It is the same error as the previous one.

WWXkenmo commented 3 years ago

Just finish my examination. Let me check your code and I gonna anwser your issues soonsooner or later !

WWXkenmo commented 3 years ago

Sorry for waiting long time I carefully check you code, and run a toy example on my dataset, here is my code.

> Ica.brain <- ICAcomputingN(Brain,ICA.type="JADE",RMT.default = FALSE,svd.max = 300,two.stage=FALSE)
[1] "batch 1 Indepdent Component Analysis"
[1] "emmm...centering..."
Centering data matrix
  |======================================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 20 expression programm"
[1] "batch 2 Indepdent Component Analysis"
[1] "emmm...centering..."
Centering data matrix
  |======================================================================| 100%
[1] "Done Centering"
[1] "Using RMT to estimate number of module"
[1] "RMT estimate 23 expression programm"

For the first bug in 'CrossBatchGrouping', you could run the simplest code like this, and it still work

> ica.pooling <- Ica.brain$ica.pooling
> ica.pooling <-CrossBatchGrouping(ica.pooling,cor = "pearson", W.top = 2.5)
Identify5 patterns

For the RunICAnetTF step, I should remind that you could use the annotation dataset provide by RcisTarget

> library(RcisTarget)
> data(motifAnnotations_mgi_v8) 
> Motif_Net <- TF_Net_Generate("mm9-500bp-upstream-10species.mc8nr.feather",cutoff=1)
> Brain  <- RunICAnetTF(Brain,ica.pooling,W.top.TFs=3,W.top.genes=2.5,aucMaxRank=600,Motif_Net=Motif_Net,TF_motif_annot=motifAnnotations_mgi_v8)

Also, if you want to build your own motif_annotation data from, make sure the format of the motif_annotation data frame is the same as the data frame provided by RcisTarget

> head(motifAnnotations_mgi_v8)
            motif     TF directAnnotation inferred_Orthology
1: bergman__Abd-B Hoxa10            FALSE               TRUE
2: bergman__Abd-B Hoxa11            FALSE               TRUE
3: bergman__Abd-B Hoxa13            FALSE               TRUE
4: bergman__Abd-B  Hoxa9            FALSE               TRUE
5: bergman__Abd-B Hoxb13            FALSE               TRUE
6: bergman__Abd-B  Hoxb9            FALSE               TRUE
   inferred_MotifSimil     annotationSource
1:               FALSE inferredBy_Orthology
2:               FALSE inferredBy_Orthology
3:               FALSE inferredBy_Orthology
4:               FALSE inferredBy_Orthology
5:               FALSE inferredBy_Orthology
6:               FALSE inferredBy_Orthology
                                                                                                    description
1: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 18%) which is directly annotated for motif
2: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 25%) which is directly annotated for motif
3: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 15%) which is directly annotated for motif
4: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 23%) which is directly annotated for motif
5: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 22%) which is directly annotated for motif
6: gene is orthologous to FBgn0000015 in D. melanogaster (identity = 27%) which is directly annotated for motif

> class(motifAnnotations_mgi_v8)
[1] "data.table" "data.frame"

Hope it's helpful.

K.

Roger-GOAT commented 3 years ago

Thank you @WWXkenmo , hope your examination is doing well! It is the same error.

> ica.pooling <- Ica.pancreas$ica.pooling
> ica.pooling <- CrossBatchGrouping(ica.pooling,cor = "pearson", W.top = 2.5) #
Error in hclust(d, method = method) : must have n >= 2 objects to cluster
library(RcisTarget)
data(motifAnnotations_mgi_v8)
> pancreas  <- RunICAnetTF(pancreas, Ica.pancreas$ica.pooling, W.top.TFs=3, W.top.genes=2.5,aucMaxRank=600, 
+                          Motif_Net=Motif_Net,TF_motif_annot = motifAnnotations_mgi_v8)
Running on TF-Gene Network
Running on the 1st component
...
...
Running on the 239st component
Running on the 240st component[1] "num:0"
Using 6 cores.
Error in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores,  : 
  geneSets should be a named list.
In addition: Warning message:
In .AUCell_buildRankings(exprMat = exprMat, plotStats = plotStats,  :
  There has been an error in plotGeneCount() [Message: figure margins too large]. Proceeding to calculate the rankings...

Is something wrong in my data?

Roger-GOAT commented 3 years ago

Hi @WWXkenmo , sorry for the bother, could you check where the errors come from?

Error in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores,  : 
  geneSets should be a named list.
WWXkenmo commented 3 years ago

Sorry for delayed response I checked the code and my data repeatly and I could not find any errors... Here is my suggestions: could you provide the dimensions and row names of your Ica.pancreas object? Also, make sure your DefaultAssay(pancreas) is RNA or SCT

K.

Roger-GOAT commented 3 years ago

@WWXkenmo thank you very much! The problem solved! It maybe I did not set DefaultAssay(pancreas) is RNA or SCT