cpu - compare full extraction

dmargala commented 4 years ago

I added a notebook in a new branch "comapre_full_extraction" to compare the output from spex and desi_extract_spectra. It's just a start but I've already identified two issues that seem like quick fixes.

The size of the wavelength array in the outputs differs by 1. Looks like desi_extract_spectra interprets the wavelength range as a closed interval [wmin, wmax] and spex is interpreting it as half open [wmin, wmax).
The flux values in desi_extract_spectra are scaled by a factor of dw

After applying hotfixes for those issues directly on the output in the comparison notebook, I'm able to generate the following image comparing flux values between the two extractions:

The values are mostly within np.allclose of each other. They tend to disagree closer to the patch edges, especially on the the higher wavelength side.

@sbailey, let me know if you have any suggestions for additional comparisons.

sbailey commented 4 years ago

Thanks. Indeed, please submit a PR for both of those bullet points to make the output match desi_extract_spectra: wavelength range as a closed interval, and output units as counts/wavelength instead of just counts.

We can then separately chase the remaining discrepancies. Most likely issue is exactly how many buffer wavelengths are used. In gpu_specter I hardcoded a value from back of the envelope guess; please compare to typical sizes used by original specter (which are calculated per patch and likely vary by a few pixels), and see how the agreement changes with varying the buffer zone in gpu_specter.

Most important is that the fluxes agree within their errors (as stored in the ivar = inverse variance = 1/err^2), but ideally we will also achieve np.allclose.

dmargala commented 4 years ago

I ran another test to check the consistency within desi_extract_spectra when --nwavestep varies. The same issue of discrepancy near the patch border is apparent:

f1 = desi_extract_spectra -w 6000.0,6399.2,0.8 --nwavestep 50 ...
f2 = desi_extract_spectra -w 6000.0,6399.2,0.8 --nwavestep 80 ...

Note that there are a few patch boundaries that seem to do seem to agree.

For comparison, here is the same sort of figure comparing spex and desi_extract_spectra, both with --nwavestep=50.

f1 = desi_extract_spectra -w 6000.0,6399.2,0.8 --nwavestep 50 ...
f2 = spex -w 6000.0,6399.2,0.8 --nwavestep 50 ...

It seems like whatever is causing this is not unique to gpu_specter. @sbailey, any ideas?

sbailey commented 4 years ago

Good check. You are uncovering the dirty underbelly of the divide and conquer method: the pixels at the edges are constrained differently than the pixels in the interior, and so changing the wavestep size changes which pixels are near the edge and thus changes the answer. However, from DESI-0890 figure 10 I was expecting the discrepancies to be ~0.01 sigma (doesn't matter) instead of ~0.1 sigma (kinda big).

So that issue is part of the method, and can be chased separately from gpu_specter.

However, it appears that gpu_specter and specter agree for individual patches when called with the same arguments (good), but disagree by the time we are wrappering those patches and putting them back together, so either the spex wrappering code is calling them with different inputs, or we (I) may have a bug in the reassembly bookkeeping. I'd like to make sure we understand the source of that difference in usage of ex2d_patch.

dmargala commented 4 years ago

My best guess right now for the disagreement between specter and gpu_specter with the same arguments is that get_xyrange returns a different set of pixel coordinates. I created #33 with an example comparing patches as determined by specter and gpu_specter.

dmargala commented 4 years ago

Here is another simple test comparing desi_extract_spectra to itself, using --wavestep=49 instead of the default --wavestep=50. Even this minor change seems to elicit a similar response.

The percentages below show the fraction of flux values within np.isclose of each other as well as the cumulative distribution of the pull statistic.

(f_a, f_b):
   isclose:  34.32%
(f_a - f_b)/sqrt(var_a + var_b):
     1e-05:  20.98%
    0.0001:  40.04%
     0.001:  62.18%
      0.01:  86.40%
       0.1:  99.75%
       1.0: 100.00%

And similarily for spex:

(f_a, f_b):
   isclose:  31.50%
(f_a - f_b)/sqrt(var_a + var_b):
     1e-05:  18.77%
    0.0001:  37.01%
     0.001:  58.57%
      0.01:  83.00%
       0.1:  99.41%
       1.0: 100.00%

desihub / gpu_specter

cpu - compare full extraction #31