ZJU-UoE-CCW-LAB / scCDC

single-cell Contamination Detection and Correction
GNU General Public License v3.0
6 stars 0 forks source link

NAs are not allowed in subscripted assignments during 'Calculating gene variances' Step #9

Open wsuplantpathology opened 1 week ago

wsuplantpathology commented 1 week ago

Dear authors,

I am running the scCDC (latest version), following your tutorial here https://htmlpreview.github.io/?https://github.com/ZJU-UoE-CCW-LAB/scCDC/blob/main/inst/doc/scCDC.html . I succeeded in Step '7 Contamination correction', and got corrected marix. Then in '8.1 Data normalization' step, I got error: Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Error in hvf.info$variance.expected[not.const] <- 10^fit$fitted : NAs are not allowed in subscripted assignments Calls: FindVariableFeatures ... FindVariableFeatures.StdAssay -> hvf.function -> method -> VST.dgCMatrix Execution halted

Any ideas how to solve this? Thanks so much.

Chongjing

Stephen1202-Wang commented 1 week ago

Hi @wsuplantpathology,

Thank you for reporting this issue. We haven't encountered this error before, and it's difficult to determine its exact cause based solely on the error message. It appears that the issue might be related to the values in the matrix. Could you please ensure that the input expression matrix contains only integer values, without any unusual entries like 'Inf'? If the issue persists, we would appreciate it if you could share a sample of your data for testing. If the dataset is too large, a downsampled version would be sufficient.

wsuplantpathology commented 1 week ago

Hi @Stephen1202-Wang ,

Thanks for your quick response. Here is one samll sample generated from CellRanger count. Please have a check. filtered_feature_bc_matrix.zip

Also, I attach the running log of scCDC I got, error.log

BTW, here is my script: 01.Rscript.txt

Meanwhile, my guess is that there might introduced some infinit numbers during normalization step, so the variance couldn't be calculated. I am working in this, I'll update you later.

Thanks so much. Best, Chongjing

wsuplantpathology commented 1 week ago

Got one hint from here: https://github.com/satijalab/seurat/issues/3805#issuecomment-2385376499. but doesn't work.

wsuplantpathology commented 1 week ago

Hi Weijian @Stephen1202-Wang , I think I figured out why there are NAs during Seurat FindVariableFeatures step. It seems all GCGs in the corrected matrix have Inf counts after scCDC correction. Removeing all these GCGs from corrected matrix solved the error, and everything is good for subsequent analyses. Detailed log info is here, please have a check:

> filt.matrix <- Read10X_h5("filtered_feature_bc_matrix.h5", use.names=T)
> filt.matrix@Dimnames[[1]] <- gsub("_", "-", filt.matrix@Dimnames[[1]])
> filt.matrix@Dimnames[[2]] <- gsub("_", "-", filt.matrix@Dimnames[[2]])
> mislet_seuratobj <- CreateSeuratObject(counts = filt.matrix)
> mislet_seuratobj <- NormalizeData(mislet_seuratobj, normalization.method = "LogNormalize",scale.factor = 10000)
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> mislet_seuratobj <- FindVariableFeatures(mislet_seuratobj, selection.method = "vst", nfeatures = 2000)
Finding variable features for layer counts
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> all.genes <- rownames(mislet_seuratobj)
> mislet_seuratobj <- ScaleData(mislet_seuratobj, features = all.genes)
Centering and scaling data matrix
  |======================================================================| 100%
> mislet_seuratobj <- RunPCA(mislet_seuratobj, features = VariableFeatures(object = mislet_seuratobj))
> mislet_seuratobj <- RunUMAP(mislet_seuratobj,dims=1:20)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
16:26:01 UMAP embedding parameters a = 0.9922 b = 1.112
16:26:01 Read 10782 rows and found 20 numeric columns
16:26:01 Using Annoy for neighbor search, n_neighbors = 30
16:26:01 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:26:03 Writing NN index file to temp file /tmp/RtmpoZE4fd/fileef52951c1eb67
16:26:03 Searching Annoy index using 1 thread, search_k = 3000
16:26:07 Annoy recall = 100%
16:26:07 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
16:26:09 Initializing from normalized Laplacian + noise (using RSpectra)
16:26:09 Commencing optimization for 200 epochs, with 488914 positive edges
Using method 'umap'
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:26:19 Optimization finished
> GCGs <- ContaminationDetection(mislet_seuratobj,restriction_factor = 0.5,
                                        sample_name = "AVS_0d",out_path.plot = "./",
                                        out_path.table = "./")
Caculating entropy...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06m 03s
Caculating expression level...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05m 47s
Calculating entropy-expression relation...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06s
Extracting contamination degree...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
Complete detection. 45 contaminated genes found
Warning messages:
1: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 23.8 GiB
2: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 23.8 GiB
3: In asMethod(object) :
  sparse->dense coercion: allocating vector of size 23.8 GiB
> mislet_cont_ratio <- ContaminationQuantification(mislet_seuratobj,rownames(GCGs))
Calculating contamination ratio...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=45s
The maximum contamination ratio is:0.04434, which indicates high contamination in this dataset. scCDC is highly recommended to be applied.
There were 45 warnings (use warnings() to see them)
> mislet_seuratobj_corrected <- ContaminationCorrection(mislet_seuratobj, rownames(GCGs))
Calculating correction threshold...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=46s
Decontaminating...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
There were 50 or more warnings (use warnings() to see the first 50)
> DefaultAssay(mislet_seuratobj_corrected) <- "Corrected"
> mislet_seuratobj_corrected <- NormalizeData(mislet_seuratobj_corrected, normalization.method = "LogNormalize",scale.factor = 10000)
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> mislet_seuratobj_corrected <- FindVariableFeatures(mislet_seuratobj_corrected, selection.method = "vst", nfeatures = 2000)
Finding variable features for layer counts
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error in hvf.info$variance.expected[not.const] <- 10^fit$fitted :
  NAs are not allowed in subscripted assignments

As you can see, the error is there. Now Let's find genes with Infs.

> any(mislet_seuratobj_corrected@assays$RNA@layers$counts@x == Inf)
[1] FALSE
> any(mislet_seuratobj_corrected@assays$Corrected@layers$counts@x == Inf)
[1] TRUE
> gene_names <- mislet_seuratobj_corrected@assays$Corrected@features@dimnames[[1]]
> inf_genes <- which(is.infinite(mislet_seuratobj_corrected@assays$Corrected@layers$counts@x))
> row_indices <- mislet_seuratobj_corrected@assays$Corrected@layers$counts@i[inf_genes] + 1
> unique_row_indices <- unique(row_indices)
> gene_names[unique_row_indices]
 [1] "TraesCS2B03G0075700"   "TraesCS3A03G0593600"   "TraesCS3B03G0518600"
 [4] "TraesCS3B03G0675200"   "TraesCS3B03G1196000"   "TraesCS3B03G1406700"
 [7] "TraesCS3D03G0487500"   "TraesCS3D03G0492500"   "TraesCS3D03G0966500"
[10] "TraesCS4A03G0484800"   "TraesCS4A03G1019900"   "TraesCS4B03G0124800"
[13] "TraesCS4B03G0327500"   "TraesCS4B03G0906300"   "TraesCS4B03G0941400"
[16] "TraesCS4D03G0267700"   "TraesCS4D03G0797500"   "TraesCS5A03G0447800"
[19] "TraesCS5A03G0448500"   "TraesCS5B03G0148400"   "TraesCS5B03G0441800"
[22] "TraesCS5D03G0048200"   "TraesCS5D03G0168200"   "TraesCS5D03G0413700"
[25] "TraesCS5D03G0414200"   "TraesCS5D03G0741600"   "TraesCS6A03G0472100"
[28] "TraesCS6A03G0604400"   "TraesCS6D03G0113500"   "TraesCS6D03G0512200"
[31] "TraesCS7B03G0849200"   "TraesCS7D03G0623200"   "TraesCS7D03G0884200"
[34] "TraeCSChr1A99G0002750" "TraeCSChr1B99G0004362" "TraeCSChr1B99G0006836"
[37] "TraeCSChr1B99G0006955" "TraeCSChr1B99G0006956" "TraeCSChr1B99G0006958"
[40] "TraeCSChr1D99G0010847" "TraeCSChr1D99G0010853" "TraeCSChr1D99G0010855"
[43] "TraeCSChr7A99G0077611" "TraeCSChr7B99G0085835" "TraeCSChr7D99G0087650"
> rownames(GCGs)
 [1] "TraeCSChr7A99G0077611" "TraeCSChr7D99G0087650" "TraesCS7D03G0884200"
 [4] "TraesCS5D03G0048200"   "TraeCSChr7B99G0085835" "TraesCS5B03G0148400"
 [7] "TraesCS4B03G0941400"   "TraesCS6D03G0512200"   "TraesCS3D03G0966500"
[10] "TraeCSChr1D99G0010855" "TraesCS6A03G0604400"   "TraesCS5D03G0168200"
[13] "TraeCSChr1B99G0006836" "TraesCS7B03G0849200"   "TraesCS3D03G0487500"
[16] "TraesCS4D03G0797500"   "TraesCS5A03G0448500"   "TraesCS7D03G0623200"
[19] "TraesCS3D03G0492500"   "TraeCSChr1D99G0010847" "TraeCSChr1B99G0006956"
[22] "TraesCS3B03G0675200"   "TraeCSChr1B99G0006955" "TraeCSChr1A99G0002750"
[25] "TraesCS3B03G1196000"   "TraesCS4B03G0124800"   "TraesCS3A03G0593600"
[28] "TraeCSChr1B99G0004362" "TraesCS4A03G1019900"   "TraesCS4B03G0906300"
[31] "TraesCS3B03G1406700"   "TraesCS5D03G0414200"   "TraesCS5A03G0447800"
[34] "TraesCS5D03G0413700"   "TraesCS4A03G0484800"   "TraesCS5B03G0441800"
[37] "TraesCS3B03G0518600"   "TraesCS4B03G0327500"   "TraesCS4D03G0267700"
[40] "TraeCSChr1B99G0006958" "TraesCS6D03G0113500"   "TraesCS5D03G0741600"
[43] "TraeCSChr1D99G0010853" "TraesCS2B03G0075700"   "TraesCS6A03G0472100"

At this point, you can see the genes with Inf are identical with GCGs. Then I just remove these genes with Inf from scCDC corrected matrix.

> '%!in%' <- function(x,y)!('%in%'(x,y))
> RowsKEEP<-rownames(mislet_seuratobj_corrected)[rownames(mislet_seuratobj_corrected) %!in% gene_names[unique_row_indices]]
> mislet_seuratobj_corrected_noInf <- subset(mislet_seuratobj_corrected,features=RowsKEEP)
> DefaultAssay(mislet_seuratobj_corrected_noInf) <- "Corrected"
> mislet_seuratobj_corrected_noInf <- NormalizeData(mislet_seuratobj_corrected_noInf, normalization.method = "LogNormalize",scale.factor = 10000)
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> mislet_seuratobj_corrected_noInf <- FindVariableFeatures(mislet_seuratobj_corrected_noInf, selection.method = "vst", nfeatures = 2000)
Finding variable features for layer counts
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> all.genes <- rownames(mislet_seuratobj_corrected_noInf)
> mislet_seuratobj_corrected_noInf <- ScaleData(mislet_seuratobj_corrected_noInf, features = all.genes)
Centering and scaling data matrix
  |======================================================================| 100%
> mislet_seuratobj_corrected_noInf <- RunPCA(mislet_seuratobj_corrected_noInf, features = VariableFeatures(object = mislet_seuratobj_corrected_noInf))
> mislet_seuratobj_corrected_noInf <- FindNeighbors(mislet_seuratobj_corrected_noInf, dims = 1:15)
> mislet_seuratobj_corrected_noInf <- FindClusters(mislet_seuratobj_corrected_noInf, resolution = 0.2,verbose = F)
> mislet_seuratobj_corrected_noInf <- RunUMAP(mislet_seuratobj_corrected_noInf,dims=1:15)
17:40:31 UMAP embedding parameters a = 0.9922 b = 1.112
17:40:31 Read 10782 rows and found 15 numeric columns
17:40:31 Using Annoy for neighbor search, n_neighbors = 30
17:40:31 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
17:40:33 Writing NN index file to temp file /tmp/RtmpoZE4fd/fileef52922565f5b
17:40:33 Searching Annoy index using 1 thread, search_k = 3000
17:40:36 Annoy recall = 100%
17:40:37 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
17:40:37 33 smooth knn distance failures
17:40:39 Initializing from normalized Laplacian + noise (using RSpectra)
17:40:39 Commencing optimization for 200 epochs, with 485210 positive edges
Using method 'umap'
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
17:40:47 Optimization finished

So far, everything is good, I just stoped here.

Then My question is, why all GCGs were corrected to Inf by scCDC? Any suggestions on this, is it safe to just remove these genes? Do you think it's worth to test other Normailization methods in NormalizeData step before and after scCDC? Thanks, appreciate your help.

Best, Chongjing

Stephen1202-Wang commented 1 week ago

Hi Chongjing @wsuplantpathology,

Thank you for sharing the detailed information. After reviewing your code, I noticed a missing step in the data preprocessing related to cell clustering, which is causing the downstream analysis error. scCDC requires cell clustering information, and in the tutorial, we used a reference cell type annotation. If reference annotation is not available, users should perform the clustering themselves. To do this, you can use the FindNeighbors() and FindClusters() functions in Seurat. For a detailed walkthrough of this procedure, please refer to the Seurat tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html

wsuplantpathology commented 1 week ago

Thanks, Weijian @Stephen1202-Wang . I didn't realize the reference annotation part. After added the annotation, when running GCGs <- ContaminationDetection(mislet_seuratobj,restriction_factor = 0.5, sample_name = "AVS_0d",out_path.plot = "./", out_path.table = "./"), I got new error:

> GCGs <- ContaminationDetection(mislet_seuratobj)
Caculating entropy...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=36m 03s
Caculating expression level...
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06m 42s
Calculating entropy-expression relation...
  |                                                  | 0 % ~calculating  Error in smooth.spline(tmp$mean.expr, tmp$entropy, spar = 1) :
  'tol' must be strictly positive and finite

I am still working on this. Do you have any ideas on this? Here is structure my mislet_seuratobj:

> str(mislet_seuratobj)
Formal class 'Seurat' [package "SeuratObject"] with 13 slots
  ..@ assays      :List of 1
  .. ..$ RNA:Formal class 'Assay5' [package "SeuratObject"] with 8 slots
  .. .. .. ..@ layers    :List of 3
  .. .. .. .. ..$ counts    :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. .. .. ..@ i       : int [1:6165532] 228 275 510 648 815 1154 1994 2256 2294 2498 ...
  .. .. .. .. .. .. ..@ p       : int [1:11520] 0 461 785 1492 2026 2746 3247 3652 4168 4562 ...
  .. .. .. .. .. .. ..@ Dim     : int [1:2] 296833 11519
  .. .. .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. .. ..@ x       : num [1:6165532] 1 1 1 1 1 1 1 1 2 2 ...
  .. .. .. .. .. .. ..@ factors : list()
  .. .. .. .. ..$ data      :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. .. .. ..@ i       : int [1:6165532] 228 275 510 648 815 1154 1994 2256 2294 2498 ...
  .. .. .. .. .. .. ..@ p       : int [1:11520] 0 461 785 1492 2026 2746 3247 3652 4168 4562 ...
  .. .. .. .. .. .. ..@ Dim     : int [1:2] 296833 11519
  .. .. .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. .. ..@ x       : num [1:6165532] 3.04 3.04 3.04 3.04 3.04 ...
  .. .. .. .. .. .. ..@ factors : list()
  .. .. .. .. ..$ scale.data: num [1:296833, 1:11519] -0.00932 -0.0647 -0.02023 -0.11578 0 ...
  .. .. .. ..@ cells     :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
  .. .. .. .. .. ..@ .Data: logi [1:11519, 1:3] TRUE TRUE TRUE TRUE TRUE TRUE ...
  .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. .. .. .. .. ..$ : chr [1:3] "counts" "data" "scale.data"
  .. .. .. .. .. ..$ dim     : int [1:2] 11519 3
  .. .. .. .. .. ..$ dimnames:List of 2
  .. .. .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. .. .. .. ..$ : chr [1:3] "counts" "data" "scale.data"
  .. .. .. ..@ features  :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
  .. .. .. .. .. ..@ .Data: logi [1:296833, 1:3] TRUE TRUE TRUE TRUE TRUE TRUE ...
  .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. .. .. ..$ : chr [1:296833] "TraesCS1A03G0000200" "TraesCS1A03G0000400" "TraesCS1A03G0000600" "TraesCS1A03G0000800" ...
  .. .. .. .. .. .. .. ..$ : chr [1:3] "counts" "data" "scale.data"
  .. .. .. .. .. ..$ dim     : int [1:2] 296833 3
  .. .. .. .. .. ..$ dimnames:List of 2
  .. .. .. .. .. .. ..$ : chr [1:296833] "TraesCS1A03G0000200" "TraesCS1A03G0000400" "TraesCS1A03G0000600" "TraesCS1A03G0000800" ...
  .. .. .. .. .. .. ..$ : chr [1:3] "counts" "data" "scale.data"
  .. .. .. ..@ default   : int 1
  .. .. .. ..@ assay.orig: chr(0)
  .. .. .. ..@ meta.data :'data.frame': 296833 obs. of  8 variables:
  .. .. .. .. ..$ vf_vst_counts_mean                 : num [1:296833] 8.68e-05 4.34e-03 4.34e-04 1.41e-02 0.00 ...
  .. .. .. .. ..$ vf_vst_counts_variance             : num [1:296833] 8.68e-05 4.32e-03 4.34e-04 1.46e-02 0.00 ...
  .. .. .. .. ..$ vf_vst_counts_variance.expected    : num [1:296833] 8.68e-05 4.61e-03 4.59e-04 1.53e-02 0.00 ...
  .. .. .. .. ..$ vf_vst_counts_variance.standardized: num [1:296833] 1 0.937 0.945 0.953 0 ...
  .. .. .. .. ..$ vf_vst_counts_variable             : logi [1:296833] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. .. .. .. ..$ vf_vst_counts_rank                 : int [1:296833] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. ..$ var.features                       : chr [1:296833] NA NA NA NA ...
  .. .. .. .. ..$ var.features.rank                  : int [1:296833] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. ..@ misc      : Named list()
  .. .. .. ..@ key       : chr "rna_"
  ..@ meta.data   :'data.frame':        11519 obs. of  7 variables:
  .. ..$ orig.ident     : Factor w/ 1 level "SeuratProject": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ nCount_RNA     : num [1:11519] 503 383 795 630 927 571 445 592 450 627 ...
  .. ..$ nFeature_RNA   : int [1:11519] 461 324 707 534 720 501 405 516 394 575 ...
  .. ..$ percent.mt     : num [1:11519] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ percent.chlor  : num [1:11519] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ RNA_snn_res.0.5: Factor w/ 14 levels "0","1","2","3",..: 1 3 1 1 1 3 1 11 11 5 ...
  .. ..$ seurat_clusters: Factor w/ 14 levels "0","1","2","3",..: 1 3 1 1 1 3 1 11 11 5 ...
  ..@ active.assay: chr "RNA"
  ..@ active.ident: Factor w/ 14 levels "0","1","2","3",..: 1 3 1 1 1 3 1 11 11 5 ...
  .. ..- attr(*, "names")= chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  ..@ graphs      :List of 2
  .. ..$ RNA_nn :Formal class 'Graph' [package "SeuratObject"] with 7 slots
  .. .. .. ..@ assay.used: chr "RNA"
  .. .. .. ..@ i         : int [1:230380] 0 236 328 2165 2348 2921 3004 3074 3164 5209 ...
  .. .. .. ..@ p         : int [1:11520] 0 19 45 70 92 113 143 165 182 198 ...
  .. .. .. ..@ Dim       : int [1:2] 11519 11519
  .. .. .. ..@ Dimnames  :List of 2
  .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. ..@ x         : num [1:230380] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..@ factors   : list()
  .. ..$ RNA_snn:Formal class 'Graph' [package "SeuratObject"] with 7 slots
  .. .. .. ..@ assay.used: chr "RNA"
  .. .. .. ..@ i         : int [1:732079] 0 236 328 380 422 581 794 868 1046 1484 ...
  .. .. .. ..@ p         : int [1:11520] 0 78 139 205 281 357 430 509 558 618 ...
  .. .. .. ..@ Dim       : int [1:2] 11519 11519
  .. .. .. ..@ Dimnames  :List of 2
  .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. ..@ x         : num [1:732079] 1 0.1765 0.1765 0.1111 0.0811 ...
  .. .. .. ..@ factors   : list()
  ..@ neighbors   : list()
  ..@ reductions  :List of 2
  .. ..$ pca :Formal class 'DimReduc' [package "SeuratObject"] with 9 slots
  .. .. .. ..@ cell.embeddings           : num [1:11519, 1:50] 0.3828 0.0743 1.5324 0.689 1.5465 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. .. .. ..$ : chr [1:50] "PC_1" "PC_2" "PC_3" "PC_4" ...
  .. .. .. ..@ feature.loadings          : num [1:1095, 1:50] -0.21856 -0.20025 -0.18172 -0.04436 -0.00252 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:1095] "TraesCS3A03G1257300" "TraesCS3D03G1184000" "TraesCS3B03G1502700" "TraesCS6B03G1183800" ...
  .. .. .. .. .. ..$ : chr [1:50] "PC_1" "PC_2" "PC_3" "PC_4" ...
  .. .. .. ..@ feature.loadings.projected: num[0 , 0 ]
  .. .. .. ..@ assay.used                : chr "RNA"
  .. .. .. ..@ global                    : logi FALSE
  .. .. .. ..@ stdev                     : num [1:50] 3.26 2.97 2.73 2.35 2.16 ...
  .. .. .. ..@ jackstraw                 :Formal class 'JackStrawData' [package "SeuratObject"] with 4 slots
  .. .. .. .. .. ..@ empirical.p.values     : num[0 , 0 ]
  .. .. .. .. .. ..@ fake.reduction.scores  : num[0 , 0 ]
  .. .. .. .. .. ..@ empirical.p.values.full: num[0 , 0 ]
  .. .. .. .. .. ..@ overall.p.values       : num[0 , 0 ]
  .. .. .. ..@ misc                      :List of 1
  .. .. .. .. ..$ total.variance: num 413
  .. .. .. ..@ key                       : chr "PC_"
  .. ..$ umap:Formal class 'DimReduc' [package "SeuratObject"] with 9 slots
  .. .. .. ..@ cell.embeddings           : num [1:11519, 1:2] -4.13 -2.41 -4.02 -5.27 -4.17 ...
  .. .. .. .. ..- attr(*, "scaled:center")= num [1:2] -0.597 -1.053
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:11519] "AAACCCACAAAGCAAT-1" "AAACCCACAAGTTCGT-1" "AAACCCACACCCGTAG-1" "AAACCCACAGCTGTCG-1" ...
  .. .. .. .. .. ..$ : chr [1:2] "umap_1" "umap_2"
  .. .. .. ..@ feature.loadings          : num[0 , 0 ]
  .. .. .. ..@ feature.loadings.projected: num[0 , 0 ]
  .. .. .. ..@ assay.used                : chr "RNA"
  .. .. .. ..@ global                    : logi TRUE
  .. .. .. ..@ stdev                     : num(0)
  .. .. .. ..@ jackstraw                 :Formal class 'JackStrawData' [package "SeuratObject"] with 4 slots
  .. .. .. .. .. ..@ empirical.p.values     : num[0 , 0 ]
  .. .. .. .. .. ..@ fake.reduction.scores  : num[0 , 0 ]
  .. .. .. .. .. ..@ empirical.p.values.full: num[0 , 0 ]
  .. .. .. .. .. ..@ overall.p.values       : num[0 , 0 ]
  .. .. .. ..@ misc                      : list()
  .. .. .. ..@ key                       : chr "umap_"
  ..@ images      : list()
  ..@ project.name: chr "SeuratProject"
  ..@ misc        : list()
  ..@ version     :Classes 'package_version', 'numeric_version'  hidden list of 1
  .. ..$ : int [1:3] 5 0 2
  ..@ commands    :List of 7
  .. ..$ NormalizeData.RNA       :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
  .. .. .. ..@ name       : chr "NormalizeData.RNA"
  .. .. .. ..@ time.stamp : POSIXct[1:1], format: "2024-10-02 11:49:49"
  .. .. .. ..@ assay.used : chr "RNA"
  .. .. .. ..@ call.string: chr [1:2] "NormalizeData(mislet_seuratobj, normalization.method = \"LogNormalize\", " "    scale.factor = 10000)"
  .. .. .. ..@ params     :List of 5
  .. .. .. .. ..$ assay               : chr "RNA"
  .. .. .. .. ..$ normalization.method: chr "LogNormalize"
  .. .. .. .. ..$ scale.factor        : num 10000
  .. .. .. .. ..$ margin              : num 1
  .. .. .. .. ..$ verbose             : logi TRUE
  .. ..$ FindVariableFeatures.RNA:Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
  .. .. .. ..@ name       : chr "FindVariableFeatures.RNA"
  .. .. .. ..@ time.stamp : POSIXct[1:1], format: "2024-10-02 11:50:30"
  .. .. .. ..@ assay.used : chr "RNA"
  .. .. .. ..@ call.string: chr [1:2] "FindVariableFeatures(mislet_seuratobj, selection.method = \"vst\", " "    nfeatures = 2000)"
  .. .. .. ..@ params     :List of 12
  .. .. .. .. ..$ assay              : chr "RNA"
  .. .. .. .. ..$ selection.method   : chr "vst"
  .. .. .. .. ..$ loess.span         : num 0.3
  .. .. .. .. ..$ clip.max           : chr "auto"
  .. .. .. .. ..$ mean.function      :function (mat, display_progress)
  .. .. .. .. ..$ dispersion.function:function (mat, display_progress)
  .. .. .. .. ..$ num.bin            : num 20
  .. .. .. .. ..$ binning.method     : chr "equal_width"
  .. .. .. .. ..$ nfeatures          : num 2000
  .. .. .. .. ..$ mean.cutoff        : num [1:2] 0.1 8
  .. .. .. .. ..$ dispersion.cutoff  : num [1:2] 1 Inf
  .. .. .. .. ..$ verbose            : logi TRUE
  .. ..$ ScaleData.RNA           :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
  .. .. .. ..@ name       : chr "ScaleData.RNA"
  .. .. .. ..@ time.stamp : POSIXct[1:1], format: "2024-10-02 11:53:35"
  .. .. .. ..@ assay.used : chr "RNA"
  .. .. .. ..@ call.string: chr "ScaleData(mislet_seuratobj, features = all.genes)"
  .. .. .. ..@ params     :List of 10
  .. .. .. .. ..$ features          : chr [1:296833] "TraesCS1A03G0000200" "TraesCS1A03G0000400" "TraesCS1A03G0000600" "TraesCS1A03G0000800" ...
  .. .. .. .. ..$ assay             : chr "RNA"
  .. .. .. .. ..$ model.use         : chr "linear"
  .. .. .. .. ..$ use.umi           : logi FALSE
  .. .. .. .. ..$ do.scale          : logi TRUE
  .. .. .. .. ..$ do.center         : logi TRUE
  .. .. .. .. ..$ scale.max         : num 10
  .. .. .. .. ..$ block.size        : num 1000
  .. .. .. .. ..$ min.cells.to.block: num 3000
  .. .. .. .. ..$ verbose           : logi TRUE
  .. ..$ RunPCA.RNA              :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
  .. .. .. ..@ name       : chr "RunPCA.RNA"
  .. .. .. ..@ time.stamp : POSIXct[1:1], format: "2024-10-02 11:56:55"
  .. .. .. ..@ assay.used : chr "RNA"
  .. .. .. ..@ call.string: chr "RunPCA(mislet_seuratobj, features = VariableFeatures(object = mislet_seuratobj))"
  .. .. .. ..@ params     :List of 11
  .. .. .. .. ..$ assay          : chr "RNA"
  .. .. .. .. ..$ features       : chr [1:2000] "TraesCS3A03G1257300" "TraesCS3D03G1184000" "TraesCS3B03G1502700" "TraesCS6B03G1183800" ...
  .. .. .. .. ..$ npcs           : num 50
  .. .. .. .. ..$ rev.pca        : logi FALSE
  .. .. .. .. ..$ weight.by.var  : logi TRUE
  .. .. .. .. ..$ verbose        : logi TRUE
  .. .. .. .. ..$ ndims.print    : int [1:5] 1 2 3 4 5
  .. .. .. .. ..$ nfeatures.print: num 30
  .. .. .. .. ..$ reduction.name : chr "pca"
  .. .. .. .. ..$ reduction.key  : chr "PC_"
  .. .. .. .. ..$ seed.use       : num 42
  .. ..$ FindNeighbors.RNA.pca   :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
  .. .. .. ..@ name       : chr "FindNeighbors.RNA.pca"
  .. .. .. ..@ time.stamp : POSIXct[1:1], format: "2024-10-02 12:02:33"
  .. .. .. ..@ assay.used : chr "RNA"
  .. .. .. ..@ call.string: chr "FindNeighbors(mislet_seuratobj, dims = 1:10)"
  .. .. .. ..@ params     :List of 16
  .. .. .. .. ..$ reduction      : chr "pca"
  .. .. .. .. ..$ dims           : int [1:10] 1 2 3 4 5 6 7 8 9 10
  .. .. .. .. ..$ assay          : chr "RNA"
  .. .. .. .. ..$ k.param        : num 20
  .. .. .. .. ..$ return.neighbor: logi FALSE
  .. .. .. .. ..$ compute.SNN    : logi TRUE
  .. .. .. .. ..$ prune.SNN      : num 0.0667
  .. .. .. .. ..$ nn.method      : chr "annoy"
  .. .. .. .. ..$ n.trees        : num 50
  .. .. .. .. ..$ annoy.metric   : chr "euclidean"
  .. .. .. .. ..$ nn.eps         : num 0
  .. .. .. .. ..$ verbose        : logi TRUE
  .. .. .. .. ..$ do.plot        : logi FALSE
  .. .. .. .. ..$ graph.name     : chr [1:2] "RNA_nn" "RNA_snn"
  .. .. .. .. ..$ l2.norm        : logi FALSE
  .. .. .. .. ..$ cache.index    : logi FALSE
  .. ..$ FindClusters            :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
  .. .. .. ..@ name       : chr "FindClusters"
  .. .. .. ..@ time.stamp : POSIXct[1:1], format: "2024-10-02 12:02:51"
  .. .. .. ..@ assay.used : chr "RNA"
  .. .. .. ..@ call.string: chr "FindClusters(mislet_seuratobj, resolution = 0.5)"
  .. .. .. ..@ params     :List of 11
  .. .. .. .. ..$ graph.name      : chr "RNA_snn"
  .. .. .. .. ..$ cluster.name    : chr "RNA_snn_res.0.5"
  .. .. .. .. ..$ modularity.fxn  : num 1
  .. .. .. .. ..$ resolution      : num 0.5
  .. .. .. .. ..$ method          : chr "matrix"
  .. .. .. .. ..$ algorithm       : num 1
  .. .. .. .. ..$ n.start         : num 10
  .. .. .. .. ..$ n.iter          : num 10
  .. .. .. .. ..$ random.seed     : num 0
  .. .. .. .. ..$ group.singletons: logi TRUE
  .. .. .. .. ..$ verbose         : logi TRUE
  .. ..$ RunUMAP.RNA.pca         :Formal class 'SeuratCommand' [package "SeuratObject"] with 5 slots
  .. .. .. ..@ name       : chr "RunUMAP.RNA.pca"
  .. .. .. ..@ time.stamp : POSIXct[1:1], format: "2024-10-02 12:05:13"
  .. .. .. ..@ assay.used : chr "RNA"
  .. .. .. ..@ call.string: chr "RunUMAP(mislet_seuratobj, dims = 1:20)"
  .. .. .. ..@ params     :List of 25
  .. .. .. .. ..$ dims                : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. ..$ reduction           : chr "pca"
  .. .. .. .. ..$ assay               : chr "RNA"
  .. .. .. .. ..$ slot                : chr "data"
  .. .. .. .. ..$ umap.method         : chr "uwot"
  .. .. .. .. ..$ return.model        : logi FALSE
  .. .. .. .. ..$ n.neighbors         : int 30
  .. .. .. .. ..$ n.components        : int 2
  .. .. .. .. ..$ metric              : chr "cosine"
  .. .. .. .. ..$ learning.rate       : num 1
  .. .. .. .. ..$ min.dist            : num 0.3
  .. .. .. .. ..$ spread              : num 1
  .. .. .. .. ..$ set.op.mix.ratio    : num 1
  .. .. .. .. ..$ local.connectivity  : int 1
  .. .. .. .. ..$ repulsion.strength  : num 1
  .. .. .. .. ..$ negative.sample.rate: int 5
  .. .. .. .. ..$ uwot.sgd            : logi FALSE
  .. .. .. .. ..$ seed.use            : int 42
  .. .. .. .. ..$ angular.rp.forest   : logi FALSE
  .. .. .. .. ..$ densmap             : logi FALSE
  .. .. .. .. ..$ dens.lambda         : num 2
  .. .. .. .. ..$ dens.frac           : num 0.3
  .. .. .. .. ..$ dens.var.shift      : num 0.1
  .. .. .. .. ..$ verbose             : logi TRUE
  .. .. .. .. ..$ reduction.name      : chr "umap"
  ..@ tools       : list()
Stephen1202-Wang commented 1 week ago

Hi @wsuplantpathology , In this step, scCDC fits a relationship between gene's expression level and gene's entropy in each cell cluster. To fit the relationship, the smooth.spline() function requires at least two distinct data points. If tmp$mean.expr or tmp$entropy have only a single unique value, you will get this error. Could you check if there's one cluster whose gene expressions are the same?

wsuplantpathology commented 6 days ago

Hi Weijian @Stephen1202-Wang , I don't know why, but increasing to a higher dims=1:20 in FindNeighbors and RunUMAP solved the issue, there are 14 factors in @ active.ident, so I guess there must be more dims than 14 factors. Cheers, Chongjing

Stephen1202-Wang commented 6 days ago

Hi Chongjing @wsuplantpathology, The performance of scCDC is closely linked to the quality of cell clustering. Improving the definition of cell clusters can enhance the decontamination results.

Cheers, Weijian