Closed Tianqi-Ma closed 2 years ago
Please check whether all of three cell types exist in the new dataset after downsampling.
Please check whether all of three cell types exist in the new dataset after downsampling.
yes, all three types are both in ref and query dataset:
> seurat.ref <- seurat.ref[,sample(colnames(seurat.ref),size=5000,replace=F)]
> table(seurat.ref$celltype)
GM12878 HEPG2 SKBR3
1338 2990 672
> table(seurat.query$orig.ident)
GM12878 HEPG2 SKBR3
800 16623 3548
BTW,my strategy was make 2 out of the 3 cell types with fixed number and downsampleing the left one to 8/20/40/80/800 cells to determine the limits of scMAGIC (for example, downsample GM12878 to 8/20/40/80/800 cells and keep HEPG2/SKBR3 unchanged). Initially, I used the dataset which downsample one cell type to only 20 cells and got the error I described. I thought it may because the cell number is too small. Thus I test the 800 downsampled cells, but the error still there. 800 cells should be enough for annotation, I supppose.
Besides, it is not relevant to the data size. I tried 3000 cells (contain 3 cell types), it still reported error.
> seurat.query <- seurat.query[,sample(colnames(seurat.query),size=3000,replace=F)]
> table(seurat.query$orig.ident)
GM12878 HEPG2 SKBR3
1849 195 956
> table(seurat.ref$celltype)
GM12878 HEPG2 SKBR3
1369 3011 620
> seurat.ref <- seurat.ref[,sample(colnames(seurat.ref),size=3000,replace=F)]
> table(seurat.ref$celltype)
GM12878 HEPG2 SKBR3
832 1789 379
> seurat.query <- scMAGIC_Seurat(seurat.query, seurat.ref, atlas = 'HCL', corr_use_HVGene = 3000)
[1] "Sum single cell counts matrix:"
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
[1] "Number of overlapped genes:"
[1] 27626
[1] "Start clustering :"
starting worker pid=189147 on localhost:11713 at 08:20:10.650
starting worker pid=189148 on localhost:11713 at 08:20:10.650
starting worker pid=189146 on localhost:11713 at 08:20:10.650
starting worker pid=189149 on localhost:11713 at 08:20:10.652
Attaching SeuratObject
Attaching SeuratObject
Attaching SeuratObject
Attaching SeuratObject
[1] "Clustering completed!"
[1] "Find marker genes of cell types in reference:"
starting worker pid=200399 on localhost:11713 at 08:20:57.471
starting worker pid=200400 on localhost:11713 at 08:20:57.478
starting worker pid=200398 on localhost:11713 at 08:20:57.504
starting worker pid=200401 on localhost:11713 at 08:20:57.507
Attaching SeuratObject
Attaching SeuratObject
Attaching SeuratObject
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: Cell group 2 is empty - no cells with identity class
In addition: Warning messages:
1: In eval(predvars, data, env) : NaNs produced
2: In hvf.info$variance.expected[not.const] <- 10^fit$fitted :
number of items to replace is not a multiple of replacement length
I have tried to make some changes to solve the problem. Please reinstall scMAGIC and check whether it can run successfully.
I have tried to make some changes to solve the problem. Please reinstall scMAGIC and check whether it can run successfully.
Thank you so much for updating. And yes, this error has been fixed. Could you please tell me what you've done?
BTW, I encountered another error which relevant to the RAM usage. It may occour when dealing with large dataset. After I apply smaller dataset, It will be gone.
Error in checkForRemoteErrors(val) :
one node produced an error: The total size of the 3 globals exported for future expression ('FUN()') is 557.49 MiB.. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). There are three globals: 'data.use' (557.41 MiB of class 'S4'), 'j' (64.98 KiB of class 'numeric') and 'FUN' (14.94 KiB of class 'function')
And I found the solution by adding this option:
options(future.globals.maxSize = 1000 * 1024^2)
from https://satijalab.org/seurat/articles/future_vignette.html
Maybe you can add this setting into package. But be careful, my job got killed on server as this option may take too much resource.
BTW, I also tested on a small dataset (~200 cells), an error returned:
[1] "Sum single cell counts matrix:"
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
[1] "Number of overlapped genes:"
[1] 27626
[1] "Start clustering :"
starting worker pid=76173 on localhost:11302 at 09:38:10.134
starting worker pid=76171 on localhost:11302 at 09:38:10.134
starting worker pid=76172 on localhost:11302 at 09:38:10.135
starting worker pid=76170 on localhost:11302 at 09:38:10.136
Attaching SeuratObject
Attaching SeuratObject
Attaching SeuratObject
Attaching SeuratObject
Warning in irlba(A = t(x = object), nv = npcs, ...) :
Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead.
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: Error: You should provide a smaller resolution!
Calls: scMAGIC_Seurat ... clusterApply -> staticClusterApply -> checkForRemoteErrors
In addition: Warning messages:
1: In eval(predvars, data, env) : NaNs produced
2: In hvf.info$variance.expected[not.const] <- 10^fit$fitted :
number of items to replace is not a multiple of replacement length
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Execution halted
Execution halted
Execution halted
Warning message:
Maybe it is better to make imporatant parameters (like resolution) can be passed to functions.
I have tried to make some changes to solve the problem. Please reinstall scMAGIC and check whether it can run successfully.
Thank you so much for updating. And yes, this error has been fixed. Could you please tell me what you've done?
BTW, I encountered another error which relevant to the RAM usage. It may occour when dealing with large dataset. After I apply smaller dataset, It will be gone.
Error in checkForRemoteErrors(val) : one node produced an error: The total size of the 3 globals exported for future expression ('FUN()') is 557.49 MiB.. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). There are three globals: 'data.use' (557.41 MiB of class 'S4'), 'j' (64.98 KiB of class 'numeric') and 'FUN' (14.94 KiB of class 'function')
And I found the solution by adding this option:
options(future.globals.maxSize = 1000 * 1024^2)
from https://satijalab.org/seurat/articles/future_vignette.html Maybe you can add this setting into package. But be careful, my job got killed on server as this option may take too much resource.
It's a parameter setting problem, because I didn't consider the situation where there are less than four cell types.
BTW, I also tested on a small dataset (~200 cells), an error returned:
[1] "Sum single cell counts matrix:" Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-') [1] "Number of overlapped genes:" [1] 27626 [1] "Start clustering :" starting worker pid=76173 on localhost:11302 at 09:38:10.134 starting worker pid=76171 on localhost:11302 at 09:38:10.134 starting worker pid=76172 on localhost:11302 at 09:38:10.135 starting worker pid=76170 on localhost:11302 at 09:38:10.136 Attaching SeuratObject Attaching SeuratObject Attaching SeuratObject Attaching SeuratObject Warning in irlba(A = t(x = object), nv = npcs, ...) : Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead. You're computing too large a percentage of total singular values, use a standard svd instead. Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead. Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead. Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead. Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead. Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead. Warning in irlba(A = t(x = object), nv = npcs, ...) : You're computing too large a percentage of total singular values, use a standard svd instead. Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: Error: You should provide a smaller resolution! Calls: scMAGIC_Seurat ... clusterApply -> staticClusterApply -> checkForRemoteErrors In addition: Warning messages: 1: In eval(predvars, data, env) : NaNs produced 2: In hvf.info$variance.expected[not.const] <- 10^fit$fitted : number of items to replace is not a multiple of replacement length Execution halted Error in unserialize(node$con) : error reading from connection Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize Error in unserialize(node$con) : error reading from connection Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize Error in unserialize(node$con) : error reading from connection Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize Error in unserialize(node$con) : error reading from connection Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize Execution halted Execution halted Execution halted Execution halted Warning message:
Maybe it is better to make imporatant parameters (like resolution) can be passed to functions.
Thanks very much for your suggestion!
Hi,
I found another tricky issue. So I was subsetting the three-cell-type dataset into 3000/2000/1000/500/300/100 cells of each type to see how many cells in ref are enough to do annotation. I initially used those subset datasets as ref and a randomly sampled 5000 cells as query: for example: ref (3000 cells):
GM12878 HEPG2 SKBR3
3000 3000 3000
query (randomly sampled 5000 cells):
GM12878 HEPG2 SKBR3
1333 3042 625
it returns an error:
[1] "Find marker genes of cell types in reference:"
starting worker pid=55718 on localhost:11689 at 07:18:43.576
starting worker pid=55716 on localhost:11689 at 07:18:43.577
starting worker pid=55717 on localhost:11689 at 07:18:43.579
starting worker pid=55715 on localhost:11689 at 07:18:43.582
Attaching SeuratObject
Attaching SeuratObject
Attaching SeuratObject
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: Cell group 2 is empty - no cells with identity class
Calls: scMAGIC_Seurat ... clusterApply -> staticClusterApply -> checkForRemoteErrors
In addition: Warning messages:
1: In eval(predvars, data, env) : NaNs produced
2: In hvf.info$variance.expected[not.const] <- 10^fit$fitted :
number of items to replace is not a multiple of replacement length
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
And same error can be seen when 100-cell ref used.
But when I do it reversely (3000 cells each type as query and randomly sampled 5000 cells as ref), no error occoured.
Actually, I always get error when using 3000-cell-each-type dataset as ref. And the randomly sampled 5000 cells works almost everytime (well, sometimes can be failed when only less then 4 cells of a cell type in a query ). Very tricky, isn't it?
Is there any special requirements on the ref or query dataset? (number of cell types like you mentioned or number of cells?)
BTW, a new error occoured when using 1000-cell-each-type as query:
[1] "Build local reference"
starting worker pid=36169 on localhost:11257 at 08:23:52.549
starting worker pid=36172 on localhost:11257 at 08:23:52.549
starting worker pid=36171 on localhost:11257 at 08:23:52.553
starting worker pid=36170 on localhost:11257 at 08:23:52.559
Package 'mclust' version 5.4.8
Type 'citation("mclust")' for citing this R package in publications.
Package 'mclust' version 5.4.8
Type 'citation("mclust")' for citing this R package in publications.
Package 'mclust' version 5.4.8
Type 'citation("mclust")' for citing this R package in publications.
Error in checkForRemoteErrors(val) :
one node produced an error: missing value where TRUE/FALSE needed
Calls: scMAGIC_Seurat ... clusterApply -> staticClusterApply -> checkForRemoteErrors
In addition: There were 16 warnings (use warnings() to see them)
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Execution halted
I guess the problem may be related to the number of cell type. Because there are usually more than 3 cell types in a real single cell annotation situation, I didn't encounter the error before. I will test the situation to make scMAGIC more robust.
I test some examples with 3-cell type reference, but I didn't encounter the error. Could you please send me the download link of these data?
I test some examples with 3-cell type reference, but I didn't encounter the error. Could you please send me the download link of these data?
Sure, here is the ID in GEO: GSM5709379 GSM4471657 GSM3596321
Please check the corresponding cell type when downloading them.
I find that GM12878 and SKBR3 are count data while HepG2 data is normalized data, which produces NAs and then leads to the error. Although the NAs would be omitted in the latest scMAGIC, I suggest that the format of input should be consistent.
Thanks for replying and I will give it a shot again recently.
Hi, Drizzle
I was palying around scMAGIC on a artifical dataset which contains three human cell types: GM12878, HEPG2 and SKBR3. I was trying to determing the limit of scMAGIC so I downsample this dataset as a query data and the original data as reference data. But it returns an error says:
I googled a little and find few possible causes:
The only possible solution in my mind is to find another different dataset as ref. Is there any other advice on that?