Closed utf closed 7 months ago
Thanks for flagging this @utf (and producing a solution 🙌 )!
I'll run some checks to see how dependent the (fixed) multiprocessing speedup is on the size of DefectsSet
, and update accordingly (e.g. for defects generation, testing showed a tradeoff depending on the number of inequivalent sites etc).
Hi @utf!
I've had time to properly look at this now. Tbh, I'm not totally sure what's going on or how it was that slow on your setup.
In the test cases I'd been looking at before, multiprocessing
did give a decent speed up in certain cases (e.g. from like 40/50 seconds to 10/20 seconds for CdTe and LiMn1.5Ni0.5O4). I was running on a M1 Pro 2021 Macbook, 32 Gb RAM.
– Possibly a memory issue that I'm not suffering from, but I would've guessed your laptop is surely more powerful than mine
I've now updated the POTCAR
generation (5a07c80) to use caching which greatly speeds up the file generation/writing steps after doing some profiling. With the reduced computational load from this, I find multiprocessing
gives less of a speedup than before, but still faster than without multiprocessing
especially as the number of folders to write grows (at least once it's above ~30 defect folders).
As well, on my system, it's still a good bit faster with Pool
rather than ThreadPool
(and ThreadPool
is actually a bit slower than with no multiprocessing
).
E.g. with CdTe (all auto-generated intrinsic defects):
It's around 7/8 seconds with the auto multiprocessing setup, and around 10 seconds with ThreadPool
or with no multiprocessing:
and with LiMn1.5Ni0.5O4:
around 50/60 seconds with ThreadPool
/ no multiprocessing, ~24 seconds with default Pool
multiprocessing:
If you had time at some point, would you mind trying this with the current develop
branch of doped
(just running this code, with and without processes=1
to compare the timings) to see if whatever the issue was is fixed, or if it's a system-dependent thing that needs to be accounted for? Thanks! 🙌
from pymatgen.core import Structure
from doped.generation import DefectsGenerator
from doped.vasp import DefectsSet
cdte = Structure.from_file("CdTe_POSCAR")
defect_gen = DefectsGenerator(cdte)
ds = DefectsSet(defect_gen)
ds.write_files("test_pop")
Makes sense that ThreadPool
would be quicker if file I/O (rather than CPU) was the limiting factor, so not sure if that gives any hints about the differences in behaviour here
Hi @kavanase, I just tested the code in the develop branch and it brought the time down to 8 seconds. Not sure what the issue was before but it seems to be fixed (could have also been some weird issue on my end).
Ok sick, thanks for checking Alex!
Thanks so much for digging into this.
No problemo, thanks for raising!
The default multiprocessing options actually slow down input set writing on my M1 Max Mac.
Setup
Writing with multiprocessing
Time taken: 5 minutes 1 second
Writing without multiprocessing
Time taken: 14 seconds
Fix
Using multiple threads instead of multiple processes seems to fix the issue.
Time taken: 13 seconds
However, the timing doesn't substantially improve on no multiprocessing.