desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

Algorithmic differences between gpu_specter and specter #78

Closed marcelo-alvarez closed 1 year ago

marcelo-alvarez commented 1 year ago

Before running DR1 / Iron on Perlmutter, we need to characterize and understand differences between extracted spectra obtained using gpu_specter and those obtained using specter.

marcelo-alvarez commented 1 year ago

@sbailey @dmargala does this make sense?

frame-z5-001286802681_comparison

marcelo-alvarez commented 1 year ago

Adding some context / information that may not be obvious from the above plot (which is showing the residual for the flux, i.e. f1/f2-1, where f1 and f2 are the fluxes being compared).

Generally there is a negligible (<1e-7) relative difference in the flux between gpu_specter on GPU and CPU at a fixed value of nsubbundles. There is however a more significant, but still small, difference, at the level of ~1-2e-3, between gpu_specter and specter even at fixed nsubbundles. The difference is typically of the same magnitude even when changing nsubbundles, but there are some wavelengths where the difference can become larger, such as the most prominent feature at a wavelength of ~8880, where the relative difference spikes to ~1e-2 between nsubbundle values of 5 and 6, even for specter.

sbailey commented 1 year ago

@marcelo-alvarez thanks. I think this is consistent with the previous studies by Daniel, but it is good to revisit with fresh eyes and the latest code and then extending to compare redshifts.

Let's plot this normalized by the reported error to focus on differences that are statistically significant, i.e. (f1-f2) * sqrt(ivar1).

It's good that gpu_specter is self consistent between GPU and CPU.

It looks like specter and gpu_specter are at least as consistent with each other as specter nsubbundles 5 vs. 6 although it's hard to tell if the red spikes are hiding blue spikes, or whether the blue spikes are more rare but bigger when they occur. Please plot the specter nsubbundles 5 vs. 6 cases separately so that we can check that.

Please also make 2D plots of fiber vs. wavelength color coded by (f1-f2)*sqrt(ivar). That can highlight more rare cases that impact individual fibers or bundles in a way that we wouldn't see by looking at one fiber at a time.

rainwoodman commented 1 year ago

Round off errors can bias the mean towards zero. Different chips may run add accumulation in different order, producing different results. but both are biased thus difference may be not a good measure of the discrepancy from truth. Are there algorithms to mitigate round off errors in the code? Perhaps running with a high res float (eg 32 to 64) as the truth?

On Wed, Oct 5, 2022 at 11:19 AM Stephen Bailey @.***> wrote:

@marcelo-alvarez https://github.com/marcelo-alvarez thanks. I think this is consistent with the previous studies by Daniel, but it is good to revisit with fresh eyes and the latest code and then extending to compare redshifts.

Let's plot this normalized by the reported error to focus on differences that are statistically significant, i.e. (f1-f2) * sqrt(ivar1).

It's good that gpu_specter is self consistent between GPU and CPU.

It looks like specter and gpu_specter are at least as consistent with each other as specter nsubbundles 5 vs. 6 although it's hard to tell if the red spikes are hiding blue spikes, or whether the blue spikes are more rare but bigger when they occur. Please plot the specter nsubbundles 5 vs. 6 cases separately so that we can check that.

Please also make 2D plots of fiber vs. wavelength color coded by (f1-f2)*sqrt(ivar). That can highlight more rare cases that impact individual fibers or bundles in a way that we wouldn't see by looking at one fiber at a time.

— Reply to this email directly, view it on GitHub https://github.com/desihub/gpu_specter/issues/78#issuecomment-1268780816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBWTCOPCKIUPSUAW7XLHTWBXBDBANCNFSM6AAAAAAQ5CCWHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

marcelo-alvarez commented 1 year ago

Hi @rainwoodman, I'm not sure about algorithms to mitigate round off errors. The differences we are focused on here are due to changes in the algorithm itself (either the implementation being different between codes, or the parameters controlling bundling for the same code). There are some differences at the 1e-7 level between GPU and CPU for gpu_specter that are acceptable, and for many wavelengths the difference is zero. So this indicates the accumulation of round off error is well under control for gpu_specter, even if there are no explicit mitigation algorithms in it. Perhaps because most of the computations themselves are done by numpy/cupy, and they do mitigate those kinds of errors.

marcelo-alvarez commented 1 year ago

@marcelo-alvarez thanks. I think this is consistent with the previous studies by Daniel, but it is good to revisit with fresh eyes and the latest code and then extending to compare redshifts.

Let's plot this normalized by the reported error to focus on differences that are statistically significant, i.e. (f1-f2) * sqrt(ivar1).

It's good that gpu_specter is self consistent between GPU and CPU.

It looks like specter and gpu_specter are at least as consistent with each other as specter nsubbundles 5 vs. 6 although it's hard to tell if the red spikes are hiding blue spikes, or whether the blue spikes are more rare but bigger when they occur. Please plot the specter nsubbundles 5 vs. 6 cases separately so that we can check that.

Please also make 2D plots of fiber vs. wavelength color coded by (f1-f2)*sqrt(ivar). That can highlight more rare cases that impact individual fibers or bundles in a way that we wouldn't see by looking at one fiber at a time.

@sbailey the following includes residual normalized by rms. I am in the process of making the other plots, as instructed, and will update when they are ready.

frame-z5-001286802681_gc5

marcelo-alvarez commented 1 year ago

@sbailey here is the specter nsubbundles 5 vs. 6 case you requested that I plot: frame-z5-001286802681_sc5-sc6

I also include the other pairwise comparisons in case you might want to see those next: frame-z5-001286802681_gc5-gg5 frame-z5-001286802681_gc5-sc5 frame-z5-001286802681_gc5-sc6 frame-z5-001286802681_gg5-sc5 frame-z5-001286802681_gg5-sc6

I am working on the 2d plot you requested above and will update when that is ready. Mentioning @julienguy to get him in the loop on the differences at the spectrum level for the latest spectral extraction code, before extending to compare redshifts.

marcelo-alvarez commented 1 year ago

Please also make 2D plots of fiber vs. wavelength color coded by (f1-f2)*sqrt(ivar). That can highlight more rare cases that impact individual fibers or bundles in a way that we wouldn't see by looking at one fiber at a time.

@sbailey here are the 2D plots. I tried to follow your instructions as well as possible and, where there was some ambiguity in how to proceed, make them in such a way as to minimize the overall delay from back-and-forth iterations that would require significant modification to my plotting scripts, so I am showing all possible comparisons for the tests I have done so far. I also included the residuals in the form (f2-f1)/[0.5*(f1+f2)] and used the log of absolute value in each case.

Hopefully this conveys the information you need to decide next steps on this while avoiding getting lost in all the file names and plots with different scales. As discussed off-github, I will also put this in a mini-report / slides / plots-with-context, to help with clarifying and focusing the discussion here and elsewhere.

In the meantime, please let me know any revisions / any other plots you would like me to make and post here. Thanks.

frame2d-z5-00128680-499_cori-spec-cpu-5-cori-spec-cpu-6 frame2d-z5-00128680-499_prlm-gspc-cpu-5-cori-spec-cpu-5 frame2d-z5-00128680-499_prlm-gspc-cpu-5-cori-spec-cpu-6 frame2d-z5-00128680-499_prlm-gspc-cpu-5-prlm-spec-cpu-5 frame2d-z5-00128680-499_prlm-gspc-cpu-5-prlm-spec-cpu-6 frame2d-z5-00128680-499_prlm-gspc-gpu-5-cori-spec-cpu-5 frame2d-z5-00128680-499_prlm-gspc-gpu-5-cori-spec-cpu-6 frame2d-z5-00128680-499_prlm-gspc-gpu-5-prlm-gspc-cpu-5 frame2d-z5-00128680-499_prlm-gspc-gpu-5-prlm-spec-cpu-5 frame2d-z5-00128680-499_prlm-gspc-gpu-5-prlm-spec-cpu-6 frame2d-z5-00128680-499_prlm-spec-cpu-5-cori-spec-cpu-5 frame2d-z5-00128680-499_prlm-spec-cpu-5-cori-spec-cpu-6 frame2d-z5-00128680-499_prlm-spec-cpu-5-prlm-spec-cpu-6 frame2d-z5-00128680-499_prlm-spec-cpu-6-cori-spec-cpu-5 frame2d-z5-00128680-499_prlm-spec-cpu-6-cori-spec-cpu-6

marcelo-alvarez commented 1 year ago

The impact of these differences downstream to redshifts has been shown to be acceptable, as documented here. Consequently gpu_specter has been the default extraction software in the spectroscopic pipeline since desispec #1883 and will be used going forward on Perlmutter. Closing.