Irrationone / cellassign

Automated, probabilistic assignment of cell types in scRNA-seq data
191 stars 79 forks source link

Error: tensorflow/core/framework/ Allocation of exceeds 10% of system memory #66

Open rosshandler opened 4 years ago

rosshandler commented 4 years ago

Dear cellassign team,

I am trying to run cellassign in a dataset of a bit more than 400,000 cells. I was able to do some tests with subsets of 100,000 cells but when using the whole dataset I get a tensorflow error saying that "Allocation of exceeds 10% of system memory", leading to: Aborted (core dumped). I am not using GPUs so I get this message when loading the library(cellassign):

Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

I am not sure about how to proceed? Perhaps I can try to get access to GPUs, or subset the data for the annotations in a random or not random way? or making a small change on the code to avoid crashing the function? I have plenty of RAM memory to access (700GB).

Please let me know what are you suggestions and if it would be helpful to share a more detailed explanation of the errors and code.

Many thanks, Ivan

Irrationone commented 4 years ago

Hi Ivan,

How many marker genes are you running with? The first message you're getting is just a warning from tensorflow, and the core dump happens when you've run out of all memory. What does your actual memory usage look like when running cellassign?


rosshandler commented 4 years ago

Hi Allen,

I am using several gene markers, because there are several cell types. I am now subsetting the data to simplify the task and use less gene markers. If using the whole dataset (400,000 cells) It reaches the 10% of 700GB of RAM and stops because of that. I see this in the memory usage. If interested, I will let you know how it goes.

Best, Ivan

rargelaguet commented 4 years ago

Hi Allen, I am also running out of CPU memory (>70GB) for a data set with the following characteristics:

"Total number of marker genes" = 665
"Total number of cells" = 29452

Could you explain why does cellassign use so much memory?

Thanks, Ricard.

ayeTown commented 3 years ago

Was anyone able to fix their out of memory issues?

moutazhelal commented 3 years ago

Hi All, I am also getting a similar error that

2021-04-25 12:39:23.614920: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: pciBusID: 0000:09:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5 coreClock: 1.68GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s 2021-04-25 12:39:23.619884: W tensorflow/stream_executor/platform/default/] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2021-04-25 12:39:23.620671: W tensorflow/stream_executor/platform/default/] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found 2021-04-25 12:39:23.620862: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cufft64_10.dll 2021-04-25 12:39:23.621017: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library curand64_10.dll 2021-04-25 12:39:23.621777: W tensorflow/stream_executor/platform/default/] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found 2021-04-25 12:39:23.622390: W tensorflow/stream_executor/platform/default/] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found 2021-04-25 12:39:23.623057: W tensorflow/stream_executor/platform/default/] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found 2021-04-25 12:39:23.623176: W tensorflow/core/common_runtime/gpu/] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-04-25 12:39:23.623937: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-04-25 12:39:23.624064: I tensorflow/core/common_runtime/gpu/] 0 2021-04-25 12:39:23.624126: I tensorflow/core/common_runtime/gpu/] 0: N 2021-04-25 12:46:33.914495: W tensorflow/core/framework/] OP_REQUIRES failed at : Resource exhausted: OOM when allocating tensor with shape[118,7915,412,10] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu Error in py_call_impl(callable, dots$args, dots$keywords) : ResourceExhaustedError: OOM when allocating tensor with shape[118,7915,412,10] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[node Mul_3 (defined at \util\ ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation. Input Source operations connected to node Mul_3: Square (defined at \ops\
Neg (defined at \ops\

Original stack trace for 'Mul_3': File "\util\", line 180, in wrapper return target(*args, **kwargs) File "\ops\", line 334, in multiply return gen_math_ops.mul(x, y, name) File "\ops\", line 6125, in mul "Mul", x=x, y=y, name=name) File "\framework\", line 742, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "\framework\", line In addition: Warning messages: 1: In cellassign(exprs_obj = expr_ob, marker_gene_info = marker_mat_found, : Genes with no mapping counts are present. Make sure this is expected -- this can be valid input in some cases (e.g. when cell types are overspecified). 2: In cellassign(exprs_obj = expr_ob, marker_gene_info = marker_mat_found, :

Error in py_call_impl(callable, dots$args, dots$keywords) : ResourceExhaustedError: OOM when allocating tensor with shape[118,7915,412,10] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[node Mul_3 (defined at \util\ ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation. Input Source operations connected to node Mul_3: Square (defined at \ops\
Neg (defined at \ops\

Original stack trace for 'Mul_3': File "\util\", line 180, in wrapper return target(*args, **kwargs) File "\ops\", line 334, in multiply return gen_math_ops.mul(x, y, name) File "\ops\", line 6125, in mul "Mul", x=x, y=y, name=name) File "\framework\", line 742, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "\framework\", line

anybody has a solution for this ?