Closed moustakas closed 1 year ago
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
py/fastspecfit/util.py | 6 | 8 | 75.0% | ||
py/fastspecfit/io.py | 235 | 292 | 80.48% | ||
py/fastspecfit/continuum.py | 444 | 632 | 70.25% | ||
py/fastspecfit/fastspecfit.py | 33 | 327 | 10.09% | ||
py/fastspecfit/emlines.py | 610 | 1294 | 47.14% | ||
<!-- | Total: | 1331 | 2556 | 52.07% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
py/fastspecfit/test/test_fastspecfit.py | 1 | 97.92% | ||
py/fastspecfit/emlines.py | 2 | 48.85% | ||
py/fastspecfit/continuum.py | 34 | 73.05% | ||
py/fastspecfit/io.py | 127 | 68.51% | ||
py/fastspecfit/fastspecfit.py | 131 | 13.43% | ||
<!-- | Total: | 295 | --> |
Totals | |
---|---|
Change from base Build 4014896532: | -9.4% |
Covered Lines: | 2171 |
Relevant Lines: | 4055 |
This PR builds on #95 and #96 with an eye toward addressing #98, i.e., speed, while also fixing various bugs discovered in testing.
Although the algorithms are essentially identical to those in
v2.0.0
, this PR is a major refactor of how data and methods are passed between multiprocessing cores. In essence, in thev2.0.0
refactor of the code (compared tov1.0.1
), I was relying on the idea that I could read the (~few hundred MB) templates and instantiate a one-stop-shop Class (FastFit
) for all the fitting, and pass both the data and this Class to each individual process without any copying. However, for reasons I still don't fully understand, it appears that the individual processes were making their own local copy of everything (via an inefficient pickling process), which led to significant (and unacceptable) slowdowns.This PR represents a major refactor of how the data and methods are handled, which leads to an enormous speed-up (a factor of 10 or more in some cases!) compared to
v2.0.0
.For example, the partial log below illustrates fitting a single healpixel with 1008 targets in approximately 270 s (=4 m 30 s) with 128 cores on perlmutter. (With
v2.0.0
, the fitting would have taken well over 45 m with the same number of cores.) Of this time, roughly 25 s is spent on I/O, 100 s is spent pre-processing the spectra to determine initial line-widths (which could probably be sped up in a future PR), and 145 s is spent on the fitting. If I'm doing the math correctly, this comes out to an average of 18 s/core/object (145/1008*128=18
) for the fitting, which is roughly consistent with #98.After a little more testing of this PR, I'm planning to tag, run on a largish sample, and then (hopefully) process all of fuji, guadalupe, and then iron.