Error: tensorflow/core/framework/cpu_allocator_impl.cc:81: Allocation of exceeds 10% of system memory

rosshandler commented 4 years ago

Dear cellassign team,

I am trying to run cellassign in a dataset of a bit more than 400,000 cells. I was able to do some tests with subsets of 100,000 cells but when using the whole dataset I get a tensorflow error saying that "Allocation of exceeds 10% of system memory", leading to: Aborted (core dumped). I am not using GPUs so I get this message when loading the library(cellassign):

Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

I am not sure about how to proceed? Perhaps I can try to get access to GPUs, or subset the data for the annotations in a random or not random way? or making a small change on the code to avoid crashing the function? I have plenty of RAM memory to access (700GB).

Please let me know what are you suggestions and if it would be helpful to share a more detailed explanation of the errors and code.

Many thanks, Ivan

Irrationone commented 4 years ago

Hi Ivan,

How many marker genes are you running with? The first message you're getting is just a warning from tensorflow, and the core dump happens when you've run out of all memory. What does your actual memory usage look like when running cellassign?

Allen

rosshandler commented 4 years ago

Hi Allen,

I am using several gene markers, because there are several cell types. I am now subsetting the data to simplify the task and use less gene markers. If using the whole dataset (400,000 cells) It reaches the 10% of 700GB of RAM and stops because of that. I see this in the memory usage. If interested, I will let you know how it goes.

Best, Ivan

rargelaguet commented 4 years ago

Hi Allen, I am also running out of CPU memory (>70GB) for a data set with the following characteristics:

"Total number of marker genes" = 665
"Total number of cells" = 29452

Could you explain why does cellassign use so much memory?

Thanks, Ricard.

ayeTown commented 3 years ago

Was anyone able to fix their out of memory issues?

moutazhelal commented 3 years ago

Hi All, I am also getting a similar error that

2021-04-25 12:39:23.614920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: pciBusID: 0000:09:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5 coreClock: 1.68GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s 2021-04-25 12:39:23.619884: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2021-04-25 12:39:23.620671: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found 2021-04-25 12:39:23.620862: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll 2021-04-25 12:39:23.621017: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll 2021-04-25 12:39:23.621777: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found 2021-04-25 12:39:23.622390: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found 2021-04-25 12:39:23.623057: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found 2021-04-25 12:39:23.623176: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-04-25 12:39:23.623937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-04-25 12:39:23.624064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 2021-04-25 12:39:23.624126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N 2021-04-25 12:46:33.914495: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[118,7915,412,10] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu Error in py_call_impl(callable, dots$args, dots$keywords) : ResourceExhaustedError: OOM when allocating tensor with shape[118,7915,412,10] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[node Mul_3 (defined at \util\dispatch.py:180) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation. Input Source operations connected to node Mul_3: Square (defined at \ops\gen_math_ops.py:9965)
Neg (defined at \ops\gen_math_ops.py:6304)

Original stack trace for 'Mul_3': File "\util\dispatch.py", line 180, in wrapper return target(*args, **kwargs) File "\ops\math_ops.py", line 334, in multiply return gen_math_ops.mul(x, y, name) File "\ops\gen_math_ops.py", line 6125, in mul "Mul", x=x, y=y, name=name) File "\framework\op_def_library.py", line 742, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "\framework\ops.py", line In addition: Warning messages: 1: In cellassign(exprs_obj = expr_ob, marker_gene_info = marker_mat_found, : Genes with no mapping counts are present. Make sure this is expected -- this can be valid input in some cases (e.g. when cell types are overspecified). 2: In cellassign(exprs_obj = expr_ob, marker_gene_info = marker_mat_found, :

Error in py_call_impl(callable, dots$args, dots$keywords) : ResourceExhaustedError: OOM when allocating tensor with shape[118,7915,412,10] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[node Mul_3 (defined at \util\dispatch.py:180) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation. Input Source operations connected to node Mul_3: Square (defined at \ops\gen_math_ops.py:9965)
Neg (defined at \ops\gen_math_ops.py:6304)

Original stack trace for 'Mul_3': File "\util\dispatch.py", line 180, in wrapper return target(*args, **kwargs) File "\ops\math_ops.py", line 334, in multiply return gen_math_ops.mul(x, y, name) File "\ops\gen_math_ops.py", line 6125, in mul "Mul", x=x, y=y, name=name) File "\framework\op_def_library.py", line 742, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "\framework\ops.py", line

anybody has a solution for this ?

Irrationone / cellassign

Error: tensorflow/core/framework/cpu_allocator_impl.cc:81: Allocation of exceeds 10% of system memory #66