Closed jdebacker closed 3 years ago
Merging #175 (d59520f) into master (c7e6882) will increase coverage by
0.16%
. The diff coverage is90.62%
.
@@ Coverage Diff @@
## master #175 +/- ##
==========================================
+ Coverage 82.12% 82.29% +0.16%
==========================================
Files 21 21
Lines 1488 1519 +31
==========================================
+ Hits 1222 1250 +28
- Misses 266 269 +3
Flag | Coverage Δ | |
---|---|---|
unittests | 82.29% <90.62%> (+0.16%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
taxbrain/taxbrain.py | 93.42% <90.62%> (-0.53%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c7e6882...d59520f. Read the comment docs.
Thanks, @jdebacker! As long as @hdoupe says it won't cause any issues on CS I'm good with moving to this method of parallelization.
I'm good with this @jdebacker, pending some minimal memory usage profiling!
Good call @hdoupe! Let me do some memory profiling and report results back here. That maybe why y'all haven't already scaled this up more. But still could be nice to have some flexibility to do more than 2 processes at once if a machine's memory allows.
Awesome, thanks @jdebacker. I just need a rough estimate of how much RAM per CPU I need to set up once these changes are live.
Here's a report using the memory_profiler
@profile
decorator:
Serial run time = 119.13826894760132
Parallel run time = 52.473915815353394
Filename: /Users/jason.debacker/repos/Tax-Brain/taxbrain/taxbrain.py
Line # Mem usage Increment Occurences Line Contents
============================================================
114 188.6 MiB 188.6 MiB 1 @profile
115 def run(self, varlist: list = DEFAULT_VARIABLES, client=None, num_workers=1):
116 """
117 Run the calculators. TaxBrain will determine whether to do a static or
118 partial equilibrium run based on the user's inputs when initializing
119 the TaxBrain object.
120
121 Parameters
122 ----------
123 varlist: list
124 variables from the microdata to be stored in each year
125
126 Returns
127 -------
128 None
129 """
130 2345.2 MiB 2156.7 MiB 1 base_calc, reform_calc = self._make_calculators()
131 2345.2 MiB 0.0 MiB 1 if not isinstance(varlist, list):
132 msg = f"'varlist' is of type {type(varlist)}. Must be a list."
133 raise TypeError(msg)
134 2345.2 MiB 0.0 MiB 1 if self.params["behavior"]:
135 if self.verbose:
136 print("Running dynamic simulations")
137 self._dynamic_run(varlist, base_calc, reform_calc)
138 else:
139 2345.2 MiB 0.0 MiB 1 if self.verbose:
140 print("Running static simulations")
141 2345.2 MiB 0.0 MiB 1 start_time = time.time()
142 3441.2 MiB 1096.0 MiB 1 self._static_run(varlist, base_calc, reform_calc)
143 3441.2 MiB 0.0 MiB 1 end_time = time.time()
144 3441.2 MiB 0.0 MiB 1 print('Serial run time = ', end_time - start_time)
145 4390.6 MiB 949.3 MiB 1 base_calc, reform_calc = self._make_calculators()
146 4390.6 MiB 0.0 MiB 1 start_time = time.time()
147 5105.5 MiB 714.9 MiB 1 self._static_run_parallel(varlist, base_calc, reform_calc, client, num_workers)
148 5105.5 MiB 0.0 MiB 1 end_time = time.time()
149 5105.5 MiB 0.0 MiB 1 print('Parallel run time = ', end_time - start_time)
150 5105.5 MiB 0.0 MiB 1 setattr(self, "has_run", True)
151
152 5108.2 MiB 2.7 MiB 1 del base_calc, reform_calc
Nice, thanks @jdebacker. I'm down to give this a go on C/S. I think we should start with 2 workers initially and then we can bump the number once I confirm that the memory usage isn't too crazy.
Ok, the above was from running:
# run TaxBrain
tb_static.run(client=client,num_workers=8)
Memory usage seems higher with 2 or 4 workers:
Serial run time = 36.52572584152222
Parallel run time = 49.60456323623657
Filename: /Users/jason.debacker/repos/Tax-Brain/taxbrain/taxbrain.py
... 142 3507.7 MiB 1020.4 MiB 1 self._static_run(varlist, base_calc, reform_calc) ... 147 7201.3 MiB 2708.7 MiB 1 self._static_run_parallel(varlist, base_calc, reform_calc, client, ...
* 4 workers:
Serial run time = 75.65333104133606 Parallel run time = 56.44440484046936 Filename: /Users/jason.debacker/repos/Tax-Brain/taxbrain/taxbrain.py
... 142 3642.2 MiB 1023.6 MiB 1 self._static_run(varlist, base_calc, reform_calc) ... 147 7829.8 MiB 3197.9 MiB 1 self._static_run_parallel(varlist, base_calc, reform_calc, client, ...
Not sure why memory usage higher with less processes going.
Test results for the _dynamic_run()
:
# create TaxBrain object for dynamic run
tb_dynamic = TaxBrain(start_year, end_year, use_cps=True, behavior={'sub': .4}, reform=reform_url)
# run TaxBrain
tb_dynamic.run(client=client,num_workers=8)
Serial run time = 385.2248160839081
Parallel run time = 301.88964581489563
Filename: /Users/jason.debacker/repos/Tax-Brain/taxbrain/taxbrain.py
Line # Mem usage Increment Occurences Line Contents
============================================================
114 184.7 MiB 184.7 MiB 1 @profile
115 def run(self, varlist: list = DEFAULT_VARIABLES, client=None, num_workers=1):
116 """
117 Run the calculators. TaxBrain will determine whether to do a static or
118 partial equilibrium run based on the user's inputs when initializing
119 the TaxBrain object.
120
121 Parameters
122 ----------
123 varlist: list
124 variables from the microdata to be stored in each year
125
126 Returns
127 -------
128 None
129 """
130 2293.2 MiB 2108.5 MiB 1 base_calc, reform_calc = self._make_calculators()
131 2293.2 MiB 0.0 MiB 1 if not isinstance(varlist, list):
132 msg = f"'varlist' is of type {type(varlist)}. Must be a list."
133 raise TypeError(msg)
134 2293.2 MiB 0.0 MiB 1 if self.params["behavior"]:
135 2293.2 MiB 0.0 MiB 1 if self.verbose:
136 print("Running dynamic simulations")
137 2293.2 MiB 0.0 MiB 1 start_time = time.time()
138 6603.7 MiB 4310.5 MiB 1 self._dynamic_run(varlist, base_calc, reform_calc)
139 6603.7 MiB 0.0 MiB 1 end_time = time.time()
140 6603.7 MiB 0.0 MiB 1 print('Serial run time = ', end_time - start_time)
141 6903.0 MiB 299.3 MiB 1 base_calc, reform_calc = self._make_calculators()
142 6903.0 MiB 0.0 MiB 1 start_time = time.time()
143 6903.0 MiB 0.0 MiB 1 self._dynamic_run_parallel(
144 8241.6 MiB 1338.6 MiB 1 varlist, base_calc, reform_calc, client, num_workers)
145 8241.6 MiB 0.0 MiB 1 end_time = time.time()
146 8241.6 MiB 0.0 MiB 1 print('Parallel run time = ', end_time - start_time)
147 else:
148 if self.verbose:
149 print("Running static simulations")
150 start_time = time.time()
151 self._static_run(varlist, base_calc, reform_calc)
152 end_time = time.time()
153 print('Serial run time = ', end_time - start_time)
154 base_calc, reform_calc = self._make_calculators()
155 start_time = time.time()
156 self._static_run_parallel(varlist, base_calc, reform_calc,
157 client, num_workers)
158 end_time = time.time()
159 print('Parallel run time = ', end_time - start_time)
160 8241.6 MiB 0.0 MiB 1 setattr(self, "has_run", True)
161
162 8011.0 MiB -230.6 MiB 1 del base_calc, reform_calc
Merging #175 (4d3c245) into master (c7e6882) will increase coverage by
0.03%
. The diff coverage is90.90%
.
@@ Coverage Diff @@
## master #175 +/- ##
==========================================
+ Coverage 82.12% 82.15% +0.03%
==========================================
Files 21 21
Lines 1488 1547 +59
==========================================
+ Hits 1222 1271 +49
- Misses 266 276 +10
Flag | Coverage Δ | |
---|---|---|
unittests | 82.15% <90.90%> (+0.03%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
taxbrain/taxbrain.py | 92.78% <90.69%> (-1.17%) |
:arrow_down: |
taxbrain/tests/test_brain.py | 93.75% <100.00%> (+0.07%) |
:arrow_up: |
cs-config/cs_config/constants.py | 100.00% <0.00%> (ø) |
|
cs-config/cs_config/tests/test_functions.py | 100.00% <0.00%> (ø) |
|
cs-config/cs_config/helpers.py | 47.52% <0.00%> (+0.15%) |
:arrow_up: |
cs-config/cs_config/functions.py | 91.74% <0.00%> (+0.48%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c7e6882...4d3c245. Read the comment docs.
@jdebacker this looks good to me. I won't have time to fiddle with the number of workers and cpu / memory ratios until Tuesday, but I'm good with merging this whenever @andersonfrailey is.
@hdoupe No rush on that. Shall I hold off here and open another PR to work on the functions.py
for the CS app? I can do that in a separate PR when you are ready.
Sorry for the delay here. The end of the semester got a bit busy. I'm good with merging this, but before I do, any last changes/things that should be addressed before I do? @jdebacker @hdoupe
I do not have any further changes.
LGTM @andersonfrailey. Thanks @jdebacker. I'm excited to play with this
This PR proposes a different approach to parallelizing processes in the
TaxBrain.run()
method. In this approach, up to 2 x number of years in budget window are done simultaneously, in contrast to the 2 at a time done currently.The method does add a private method and take a few more lines of code.
In testing, the increase in time was about 25% for a 10 year budget window. From a couple different runs (where "serial run time" is the current method and "parallel run time" is the proposed approach:
and
@andersonfrailey let me know what you think of moving from
_static_run
to something like_static_run_parallel
@hdoupe Is Tax-Brain parallelized on C/S? Any advantages or draw backs on that platform from the method proposed here?