Closed jack-willturner closed 3 years ago
@jack-willturner DistributedDataParallel was intentional. I found it in an Nvidia cheatsheet. When you use it on a single machine with multiple GPUs, it assigns one CPU core to each GPU being used, as opposed to one CPU core to all GPUs. Therefore producing measurable computational efficiency improvements.
Right OK.
I've never used DistributedDataParallel before, so I have a few questions:
nccl
for the backend and hide the choice from the user? Do we have to divide the batch_size by the number of GPUs? Are there any other bits like that to take care of?Let's keep data parallel as you proposed for now. The distributed data parallel solution takes more time to setup, that I don't have right now. I'll be taking care of it, just not today. Let's leave it at a place where it functions.
Would be good to go through that cheat sheet at a later date and quantify what kinds of improvements each optimisation brings.
✅ Merging this PR will increase code quality in the affected files by 0.23%.
Quality metrics | Before | After | Change |
---|---|---|---|
Complexity | 16.81 🙂 | 18.52 😞 | 1.71 👎 |
Method Length | 103.41 🙂 | 102.00 🙂 | -1.41 👍 |
Working memory | 17.10 ⛔ | 16.97 ⛔ | -0.13 👍 |
Quality | 45.46% 😞 | 45.69% 😞 | 0.23% 👍 |
Other metrics | Before | After | Change |
---|---|---|---|
Lines | 628 | 621 | -7 |
Changed files | Quality Before | Quality After | Quality Change |
---|---|---|---|
train.py | 29.04% 😞 | 28.31% 😞 | -0.73% 👎 |
utils/storage.py | 68.14% 🙂 | 71.45% 🙂 | 3.31% 👍 |
Here are some functions in these files that still need a tune-up:
File | Function | Complexity | Length | Working Memory | Quality | Recommendation |
---|---|---|---|---|---|---|
utils/storage.py | build_experiment_folder | 10 🙂 | 138 😞 | 12 😞 | 50.90% 🙂 | Try splitting into smaller methods. Extract out complex expressions |
train.py | get_base_argument_parser | 0 ⭐ | 232 ⛔ | 9 🙂 | 58.32% 🙂 | Try splitting into smaller methods |
utils/storage.py | download_file | 4 ⭐ | 104 🙂 | 11 😞 | 62.58% 🙂 | Extract out complex expressions |
train.py | train | 1 ⭐ | 95 🙂 | 11 😞 | 66.80% 🙂 | Extract out complex expressions |
utils/storage.py | restore_model | 5 ⭐ | 75 🙂 | 10 😞 | 67.64% 🙂 | Extract out complex expressions |
The emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.
Please see our documentation here for details on how these metrics are calculated.
We are actively working on this report - lots more documentation and extra metrics to come!
Let us know what you think of it by mentioning @sourcery-ai in a comment.
We had
DistributedDataParallel
set instead ofDataParallel
. I assume we don't want support for distributed training? I'm not even sure how you would go about it on our machines.This meant that you could use
num_gpus_to_use
> 1.I also changed state dict loading to be more idiomatic (i.e. dataparallel modules are unwrapped and saved so that there is no need to overwrite the names of the dictionary keys when reloading).