mathpluscode commented 3 years ago

Description

Fixes #720

Type of change

What types of changes does your code introduce to DeepReg?

Please check the boxes that apply after submitting the pull request.

[x] Bugfix (non-breaking change which fixes an issue)
[ ] Code style update (formatting, renaming)
[ ] Refactoring (no functional changes, no api changes)
[x] Documentation Update (fix or improvement on the documentation)
[x] New feature (non-breaking change which adds functionality)
[ ] Other (if none of the other choices apply)

Checklist

Please check the boxes that apply after submitting the pull request.

If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

[x] I have installed pre-commit using pre-commit install and formatted all changed files. If you are not certain, run pre-commit run --all-files.
[x] My commits' message styles matches our requested structure, e.g. Issue #<issue number>: detailed message.
[x] I have updated the change log file regarding my changes.
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have added necessary documentation (if appropriate)

codecov[bot] commented 3 years ago

Codecov Report

Merging #724 (767447e) into main (6b1f4e0) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #724   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           41        41           
  Lines         2480      2484    +4     
=========================================
+ Hits          2480      2484    +4

Impacted Files	Coverage Δ
deepreg/model/network.py	`100.00% <ø> (ø)`
deepreg/dataset/loader/interface.py	`100.00% <100.00%> (ø)`
deepreg/predict.py	`100.00% <100.00%> (ø)`
deepreg/train.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6b1f4e0...767447e. Read the comment docs.

fepegar commented 3 years ago

I know very little about parallel processing, but I have some comments.

Is num_cpus descriptive? I would say CPU cores, so maybe num_cores. PyTorch uses num_workers. I'm not sure it's equivalent, though.
Is it a good idea to use all of them by default? What about disabling multiprocessing by default and allowing the user to enable it by choosing >1 cores? Again, as a reference, in PyTorch, if num_workers is 0, multiprocessing will be disabled. Users have to explicitly define num_workers > 0. I haven't seen memory issues in the DGX (since February), but I think when 1 (or more!) users are training simultaneously using all cores, everything (even ls) is much slower. If users set e.g. 10 cores/GPU, I suspect things would run faster for everyone, and this is more likely to happen if users have to manually set up the multiprocessing settings.

zacbaum commented 3 years ago

I know very little about parallel processing, but I have some comments.

Is num_cpus descriptive? I would say CPU cores, so maybe num_cores. PyTorch uses num_workers. I'm not sure it's equivalent, though.

In TF (at least for what we're doing here) it is the number of scheduled cores that are used - so num_cores is likely better.

Is it a good idea to use all of them by default? What about disabling multiprocessing by default and allowing the user to enable it by choosing >1 cores? Again, as a reference, in PyTorch, if num_workers is 0, multiprocessing will be disabled. Users have to explicitly define num_workers > 0. I haven't seen memory issues in the DGX (since February), but I think when 1 (or more!) users are training simultaneously using all cores, everything (even ls) is much slower. If users set e.g. 10 cores/GPU, I suspect things would run faster for everyone, and this is more likely to happen if users have to manually set up the multiprocessing settings.

It's definitely not a good standard - especially where most of us have been using deepreg (using shared systems), as it can lock things up if the machine has low cores / low RAM in general (as is the case with the PTs). In DeepReg, setting the available cores for the data loaders will not change the amount of multiprocessing, etc. that occurs by changing num_workers in model.fit(). These can be set independently.

Aside: The reason ls will be slower is due to RAM being over-allocated. Metadata for these commands is cached in RAM, so when that runs out, the OS has to pull things from disk.

fepegar commented 3 years ago

Thanks, @zacbaum.

Aside: The reason ls will be slower is due to RAM being over-allocated. Metadata for these commands is cached in RAM, so when that runs out, the OS has to pull things from disk.

Are you sure? I've noticed this when all cores are being 100% used, but there's a lot of RAM still available.

zacbaum commented 3 years ago

Are you sure? I've noticed this when all cores are being 100% used, but there's a lot of RAM still available.

Well if there's no RAM it will be, but it could also be compounded by lack of CPU - Entirely possible! And also likely if CPU util > 90% or so.

mathpluscode commented 3 years ago

@zacbaum @fepegar Hi, I've renamed the vars and changed the default behaviour :) please approve if it looks ok, thx!

mathpluscode commented 3 years ago

We need to be consistent throughout - things refer to parallel calls, CPUs, workers; to someone who has no idea what this means, this isn't helpful.

Can we add some more detail to the docs for this to also say why someone may use this? (e.g. shared resource and don't want to leverage all the resource ;))

I tried again :)

number elements to process asynchronously in parallel
during preprocessing, -1 means unlimited, heuristically it should be set to
the number of CPU cores available. AUTOTUNE=-1 means not limited.

@zacbaum see if it's clear to you know, if not plz propose one lol

mathpluscode commented 3 years ago

@zacbaum :0 I'm still waiting for approvals :)

fepegar commented 3 years ago

@mathpluscode By default, all CPU cores are used?

zacbaum commented 3 years ago

@zacbaum :0 I'm still waiting for approvals :)

@mathpluscode, I have left a few comments --

I (along with many others) was not working during the Easter break, and certainly not at 11:56pm on a Friday when you tagged me, so you'll have to forgive the wait...

mathpluscode commented 3 years ago

@mathpluscode By default, all CPU cores are used?

By default, CPU is limited to 1 at the very beginning with the definition of the environment variables. So the num_parallel_call will be limited by that.

I insist on not setting both to 1, as if the user does not want any limit, he/she has to modify two values. Now, you can limit the CPUs with one single value, as another one is bounded by the previous one.

DeepRegNet / DeepReg

Issue #720: enable to define num_parallel_calls in config #724

Description

Type of change

Checklist

Codecov Report