Seeking ways to speed up L-BFGS-B solution

@martinholmer

Thank you for the chisquare test tool. As noted in issue #279, unfortunately I don't think we'll find many opportunities to save time by reducing the number of targets. I'm interested in finding other ways to speed up solutions.

I have experimented with the number of workers in make_all.py but found, unfortunately, on my machine with 12 cores and 24 threads that actually slowed things down. I already had 24 threads each working almost 100% most of the time. Increasing number of workers above 1 was much slower, perhaps because of additional overhead.

In the short term, I am trying to be extra careful about choosing only essential targets (with some advice from chisquare_test.py), and I am judgmentally adjusting the balance between precision of solution and number of targets (more targets = more time, lower precision = less time). I find that changing ftol from default 1e-9 to 1e-6 gets good solutions most of the time (good = roughly <= 1% abs error for all targets, but not a formal test). Time permitting, I could create parameter files for specific Congressional districts to reduce ftol in cases where results are not as good as I'd like, although I don't think time will permit that in Phase 5. I already have 436 Congressional districts done this way (same 64 targets as in Phase 4); that took somewhere around 8 hours of dedicated computer time. I want to add Social Security and SALT targets and hope to get that running by Friday morning, ready for me to review on Sunday morning.

So that's my short term plan.

Middle term (in Phase 6), I'd like to explore a couple of possible ways to speed things up. First is using a callback function to customize the stopping criteria. I have done this with many other solvers. I suspect scipy.minimize allows that. The idea is to design stopping criteria that are "good enough". IF scipy.minimize allows this (many but not all optimization frameworks allow it), we'd set up something like ftol=1e-9 and hope to get such precise results. But we'd have an OR condition for stopping -- if it has gone more than, say, 500 iterations AND EVERY target is within +/- 1%, stop, considering that a good enough solution. Something like this. It would get us highly precise solutions where possible within 500 (or some other chosen number of iterations), but if not possible it would stop when it gets to a good solution. Of course, sometimes, neither might happen. scipy.minimize appears to have a callback capability but I have not investigated it. This approach would benefit any user, regardless of whether they have a GPU.

Second is to see whether we can get scipy and jax running on the GPU. That would benefit me, and, at least in the short term, I may be the main creator of target.csv files. And many other people have GPUs. This stackoverflow thread suggests that jax has an implementation of minimize that supports the GPU for BFGS (and if true one would suspect it also supports L-BFGS-B) (and scipy itself seems to suggest it can support a GPU for L-BFGS-B). I'd want to explore that.

There are other more-esoteric possible solutions but these two seem like they would be practical to explore.

So I have two questions: (1) do you have further thoughts on this?, and (2) do you have interest and time to explore either of these two items in Phase 6? If not, perhaps you can review any exploration I do in this area.

PSLmodels / tax-microdata-benchmarking

Seeking ways to speed up L-BFGS-B solution #283