Closed mathpluscode closed 3 years ago
Merging #724 (767447e) into main (6b1f4e0) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## main #724 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 41 41
Lines 2480 2484 +4
=========================================
+ Hits 2480 2484 +4
Impacted Files | Coverage Δ | |
---|---|---|
deepreg/model/network.py | 100.00% <ø> (ø) |
|
deepreg/dataset/loader/interface.py | 100.00% <100.00%> (ø) |
|
deepreg/predict.py | 100.00% <100.00%> (ø) |
|
deepreg/train.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 6b1f4e0...767447e. Read the comment docs.
I know very little about parallel processing, but I have some comments.
num_cpus
descriptive? I would say CPU cores, so maybe num_cores
. PyTorch uses num_workers
. I'm not sure it's equivalent, though.num_workers
is 0, multiprocessing will be disabled. Users have to explicitly define num_workers
> 0. I haven't seen memory issues in the DGX (since February), but I think when 1 (or more!) users are training simultaneously using all cores, everything (even ls
) is much slower. If users set e.g. 10 cores/GPU, I suspect things would run faster for everyone, and this is more likely to happen if users have to manually set up the multiprocessing settings.I know very little about parallel processing, but I have some comments.
- Is
num_cpus
descriptive? I would say CPU cores, so maybenum_cores
. PyTorch usesnum_workers
. I'm not sure it's equivalent, though.
In TF (at least for what we're doing here) it is the number of scheduled cores that are used - so num_cores
is likely better.
- Is it a good idea to use all of them by default? What about disabling multiprocessing by default and allowing the user to enable it by choosing >1 cores? Again, as a reference, in PyTorch, if
num_workers
is 0, multiprocessing will be disabled. Users have to explicitly definenum_workers
> 0. I haven't seen memory issues in the DGX (since February), but I think when 1 (or more!) users are training simultaneously using all cores, everything (evenls
) is much slower. If users set e.g. 10 cores/GPU, I suspect things would run faster for everyone, and this is more likely to happen if users have to manually set up the multiprocessing settings.
It's definitely not a good standard - especially where most of us have been using deepreg (using shared systems), as it can lock things up if the machine has low cores / low RAM in general (as is the case with the PTs). In DeepReg, setting the available cores for the data loaders will not change the amount of multiprocessing, etc. that occurs by changing num_workers
in model.fit()
. These can be set independently.
Aside: The reason ls
will be slower is due to RAM being over-allocated. Metadata for these commands is cached in RAM, so when that runs out, the OS has to pull things from disk.
Thanks, @zacbaum.
Aside: The reason ls will be slower is due to RAM being over-allocated. Metadata for these commands is cached in RAM, so when that runs out, the OS has to pull things from disk.
Are you sure? I've noticed this when all cores are being 100% used, but there's a lot of RAM still available.
Are you sure? I've noticed this when all cores are being 100% used, but there's a lot of RAM still available.
Well if there's no RAM it will be, but it could also be compounded by lack of CPU - Entirely possible! And also likely if CPU util > 90% or so.
@zacbaum @fepegar Hi, I've renamed the vars and changed the default behaviour :) please approve if it looks ok, thx!
We need to be consistent throughout - things refer to parallel calls, CPUs, workers; to someone who has no idea what this means, this isn't helpful.
Can we add some more detail to the docs for this to also say why someone may use this? (e.g. shared resource and don't want to leverage all the resource ;))
I tried again :)
number elements to process asynchronously in parallel
during preprocessing, -1 means unlimited, heuristically it should be set to
the number of CPU cores available. AUTOTUNE=-1 means not limited.
@zacbaum see if it's clear to you know, if not plz propose one lol
@zacbaum :0 I'm still waiting for approvals :)
@mathpluscode By default, all CPU cores are used?
@zacbaum :0 I'm still waiting for approvals :)
@mathpluscode, I have left a few comments --
I (along with many others) was not working during the Easter break, and certainly not at 11:56pm on a Friday when you tagged me, so you'll have to forgive the wait...
@mathpluscode By default, all CPU cores are used?
By default, CPU is limited to 1 at the very beginning with the definition of the environment variables. So the num_parallel_call will be limited by that.
I insist on not setting both to 1, as if the user does not want any limit, he/she has to modify two values. Now, you can limit the CPUs with one single value, as another one is bounded by the previous one.
Description
Fixes #720
Type of change
What types of changes does your code introduce to DeepReg?
Please check the boxes that apply after submitting the pull request.
Checklist
Please check the boxes that apply after submitting the pull request.
If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.
pre-commit install
and formatted all changed files. If you are not certain, runpre-commit run --all-files
.Issue #<issue number>: detailed message
.