DeepRegNet / DeepReg

Medical image registration using deep learning
Apache License 2.0
567 stars 77 forks source link

Add option to limit number of CPUs for data preprocessing #720

Closed mathpluscode closed 3 years ago

mathpluscode commented 3 years ago

Subject of the feature

Currently, we are using AUTOTUNE in data preprocessing, e.g. https://github.com/DeepRegNet/DeepReg/blob/main/deepreg/dataset/loader/interface.py#L113

However, it may take too many CPUs and thus also memories and this is not ideal on clusters. Therefore we need to be able to configure this num_parallel_calls.

The fix can be,

FYI @YipengHu @zacbaum @fepegar

mathpluscode commented 3 years ago

OK, i just tried and it seems that the source is not data-preprocessing, it's fit().

As discussed here https://github.com/tensorflow/tensorflow/issues/29968

Using

# Maximum number of threads to use for OpenMP parallel regions.
os.environ["OMP_NUM_THREADS"] = "1"
# Without setting below 2 environment variables, it didn't work for me. Thanks to @cjw85 
os.environ["TF_NUM_INTRAOP_THREADS"] = "1"
os.environ["TF_NUM_INTEROP_THREADS"] = "1"

The number of threads are now limited!

image

zacbaum commented 3 years ago

If we're upgrading to TF2.4 (#721), why not just hold off on this so that we can fully utilize OptimizationOptions which allows a limit to be placed on CPU and RAM usage?

I don't see using many threads as an issue - in fact, it often speeds things up a lot. We just need to have some way of limiting things from taking up too much CPU/RAM. Currently, the default with AUTOTUNE is the number of CPU cores and half the available RAM.

So while these fixes may prevent lots of "dead" or "sleeping" threads, I think RAM usage has been a bigger issue from what I've observed recently IMO.

mathpluscode commented 3 years ago

If we're upgrading to TF2.4 (#721), why not just hold off on this so that we can fully utilize OptimizationOptions which allows a limit to be placed on CPU and RAM usage?

I don't see using many threads as an issue - in fact, it often speeds things up a lot. We just need to have some way of limiting things from taking up too much CPU/RAM. Currently, the default with AUTOTUNE is the number of CPU cores and half the available RAM.

So while these fixes may prevent lots of "dead" or "sleeping" threads, I think RAM usage has been a bigger issue from what I've observed recently IMO.

This issue is just providing the possibility to limit cpus, as it is not really nice to allocate all CPUs in a cluster. So this is "back-compatible" without changing any existing behavior.

Regarding the memory, I hope limiting CPUs will limit the memories as if you process one image/cpu, more cpus = more images in the memory. But yes, it'd be nice to wait until you try it.

Regarding TF2.4, i won't hold any issues just for waiting the upgrade, as

which means we may not support 2.4 anytime soon (<=2 weeks)

mathpluscode commented 3 years ago

num_cpus=1 and num_parallel_calls=1

image

num_cpus=-1 and num_parallel_calls=1

image

num_cpus=-1 and num_parallel_calls=-1

image

OK.... it only confirms the fix on number of cpus/threads used, the memory seems to be not impacted?

mathpluscode commented 3 years ago

regarding the concern of @zacbaum the optimization options exist in 2.3

https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/data/experimental/OptimizationOptions

But autotune_ram_budget option in 2.4 https://www.tensorflow.org/api_docs/python/tf/data/experimental/OptimizationOptions does not exist

Anyway, this issue aims to solve the CPU problem, not memory problem. If memory problem is not solved, we do a new issue.

zacbaum commented 3 years ago

regarding the concern of @zacbaum the optimization options exist in 2.3

https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/data/experimental/OptimizationOptions

But autotune_ram_budget option in 2.4 https://www.tensorflow.org/api_docs/python/tf/data/experimental/OptimizationOptions does not exist

Anyway, this issue aims to solve the CPU problem, not memory problem. If memory problem is not solved, we do a new issue.

I would be in favour of adding in optimization options as I mentioned earlier to budget CPUs from user input (if it has the same effect as what you've tried so far does), and then do the same for RAM when we move to 2.4. This will just keep things simpler at the end of the day.