As mentioned in #381 and other issues, running our stage 2 scripts takes forever. Beyond changing solvers, another change I want to make to speed this process up is to only run the LP model for years that have seen their targets change. For example, in the PUF the targets for 2012 to 2017 have not and will not change because the CBO won't be releasing any new data and we won't be changing out SOI estimates. This means the final weights also will not change and solving for them is really just a waste of time and resources.
I'd like to propose simply skipping these years by adding a check to the stage 2 scripts that compares the targets checked into this repo for a given year to those on the local machine of whoever is running the scripts and skipping the solver for that year if they're the same. I also want to add a file to the repo that contains the MD5 check-sum for cps-matched-puf.csv and cps.csv.gz. These will be compared to the local files as well so that if there is a change to either we'll create new weights for all years. For the years that are skipped, we'll just use the weights that have already been created.
A minor downside to this is it would kind of lock us into a solver because the checks wouldn't been looking to see if we're using a new solver. We can overcome this by adding an option to force the model to solve for all years.
I believe these checks should prevent skipping years that we actually need, but if anyone can think of additional checks I should impose please let me know.
As mentioned in #381 and other issues, running our stage 2 scripts takes forever. Beyond changing solvers, another change I want to make to speed this process up is to only run the LP model for years that have seen their targets change. For example, in the PUF the targets for 2012 to 2017 have not and will not change because the CBO won't be releasing any new data and we won't be changing out SOI estimates. This means the final weights also will not change and solving for them is really just a waste of time and resources.
I'd like to propose simply skipping these years by adding a check to the stage 2 scripts that compares the targets checked into this repo for a given year to those on the local machine of whoever is running the scripts and skipping the solver for that year if they're the same. I also want to add a file to the repo that contains the MD5 check-sum for
cps-matched-puf.csv
andcps.csv.gz
. These will be compared to the local files as well so that if there is a change to either we'll create new weights for all years. For the years that are skipped, we'll just use the weights that have already been created.A minor downside to this is it would kind of lock us into a solver because the checks wouldn't been looking to see if we're using a new solver. We can overcome this by adding an option to force the model to solve for all years.
I believe these checks should prevent skipping years that we actually need, but if anyone can think of additional checks I should impose please let me know.
cc @donboyd5 @MattHJensen