PSLmodels / Tax-Brain

Tax-Brain is an integrator model for PSL tax models
http://taxbrain.pslmodels.org/
MIT License
9 stars 14 forks source link

Idea for improving speed and memory usage #94

Open hdoupe opened 4 years ago

hdoupe commented 4 years ago

Tax-Brain has been somewhat limited on Compute Studio because it has hit memory problems when running the calculations for each year in parallel. Now that C/S supports dask clusters, we should see how much of a speed up we can get for Tax-Brain. In OG-USA, @jdebacker found that passing a Calculator object from one process to another using the distributed client causes memory problems, but things work fine if you create the calculator object in the process where the calculations will be run and just advance it to the correct year there (https://github.com/PSLmodels/OG-USA/pull/496#issuecomment-542953090). So, my question is: Can this approach work for Tax-Brain, too?

andersonfrailey commented 4 years ago

@hdoupe, I'm definitely down to try this approach. If I'm understanding the process you're describing correctly, what we'd need to do is create a new function in calculator that we will create each calculator object, advance/run that calculator, then pass all the results back for aggregation/presentation. Does that sound about right?

hdoupe commented 4 years ago

Yep, you got it.

andersonfrailey commented 4 years ago

Sweet. Definitely down to give it a shot. Do you think this would cause any issues for users running Tax-Brain locally? That would be a lot of calculator creation for a personal computer to handle. Maybe we could add an argument to the run method of TaxBrain that would either run Tax-Brain as it currently runs (only two calculators created) or in this new method, depending on its argument. This might make maintenance a tad bit tougher, but I don't think it'd be a significant challenge.

jdebacker commented 3 years ago

@andersonfrailey You should be able to have this work well locally and on C/S. You can have an argument for the Dask client and have it default to None. Tax-Brain users running on their own machines may never touch it, but you can set it to what you want for Compute-Studio runs.

e.g. in OG-USA's execute.runner() function:

def runner(output_base, baseline_dir, test=False, time_path=True,
           baseline=True, iit_reform={}, og_spec={}, guid='',
           run_micro=True, tax_func_path=None, data=None, client=None,
           num_workers=1):

We calling functions like this for Compute Studio, we create a client in functions.py.