Closed mathpluscode closed 3 years ago
OK, i just tried and it seems that the source is not data-preprocessing, it's fit()
.
As discussed here https://github.com/tensorflow/tensorflow/issues/29968
Using
# Maximum number of threads to use for OpenMP parallel regions.
os.environ["OMP_NUM_THREADS"] = "1"
# Without setting below 2 environment variables, it didn't work for me. Thanks to @cjw85
os.environ["TF_NUM_INTRAOP_THREADS"] = "1"
os.environ["TF_NUM_INTEROP_THREADS"] = "1"
The number of threads are now limited!
If we're upgrading to TF2.4 (#721), why not just hold off on this so that we can fully utilize OptimizationOptions which allows a limit to be placed on CPU and RAM usage?
I don't see using many threads as an issue - in fact, it often speeds things up a lot. We just need to have some way of limiting things from taking up too much CPU/RAM. Currently, the default with AUTOTUNE is the number of CPU cores and half the available RAM.
So while these fixes may prevent lots of "dead" or "sleeping" threads, I think RAM usage has been a bigger issue from what I've observed recently IMO.
If we're upgrading to TF2.4 (#721), why not just hold off on this so that we can fully utilize OptimizationOptions which allows a limit to be placed on CPU and RAM usage?
I don't see using many threads as an issue - in fact, it often speeds things up a lot. We just need to have some way of limiting things from taking up too much CPU/RAM. Currently, the default with AUTOTUNE is the number of CPU cores and half the available RAM.
So while these fixes may prevent lots of "dead" or "sleeping" threads, I think RAM usage has been a bigger issue from what I've observed recently IMO.
This issue is just providing the possibility to limit cpus, as it is not really nice to allocate all CPUs in a cluster. So this is "back-compatible" without changing any existing behavior.
Regarding the memory, I hope limiting CPUs will limit the memories as if you process one image/cpu, more cpus = more images in the memory. But yes, it'd be nice to wait until you try it.
Regarding TF2.4, i won't hold any issues just for waiting the upgrade, as
which means we may not support 2.4 anytime soon (<=2 weeks)
num_cpus=1
and num_parallel_calls=1
num_cpus=-1
and num_parallel_calls=1
num_cpus=-1
and num_parallel_calls=-1
OK.... it only confirms the fix on number of cpus/threads used, the memory seems to be not impacted?
regarding the concern of @zacbaum the optimization options exist in 2.3
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/data/experimental/OptimizationOptions
But autotune_ram_budget
option in 2.4 https://www.tensorflow.org/api_docs/python/tf/data/experimental/OptimizationOptions does not exist
Anyway, this issue aims to solve the CPU problem, not memory problem. If memory problem is not solved, we do a new issue.
regarding the concern of @zacbaum the optimization options exist in 2.3
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/data/experimental/OptimizationOptions
But
autotune_ram_budget
option in 2.4 https://www.tensorflow.org/api_docs/python/tf/data/experimental/OptimizationOptions does not existAnyway, this issue aims to solve the CPU problem, not memory problem. If memory problem is not solved, we do a new issue.
I would be in favour of adding in optimization options as I mentioned earlier to budget CPUs from user input (if it has the same effect as what you've tried so far does), and then do the same for RAM when we move to 2.4. This will just keep things simpler at the end of the day.
Subject of the feature
Currently, we are using AUTOTUNE in data preprocessing, e.g. https://github.com/DeepRegNet/DeepReg/blob/main/deepreg/dataset/loader/interface.py#L113
However, it may take too many CPUs and thus also memories and this is not ideal on clusters. Therefore we need to be able to configure this
num_parallel_calls
.The fix can be,
num_parallel_calls
to the given value if provided, otherwisetf.data.experimental.AUTOTUNE
.num_parallel_calls
to all funcs using it.FYI @YipengHu @zacbaum @fepegar