desihub / QuasarNP

Pure Numpy implementation of QuasarNet
MIT License
0 stars 0 forks source link

Problem with desi_load_coadd #6

Closed echaussidon closed 3 years ago

echaussidon commented 3 years ago

HI @dylanagreen,

I find a mistake in io.desi_load_coadd (maybe not a mistake but it does not work always). The problem comes from this lines https://github.com/desihub/QuasarNP/blob/64e7ff4c7308d252414736f0af9f00bb44c0921a/quasarnp/io.py#L431 where the test rows == None sometimes failed when rows is not None ...

Here what I use to circumvent this problem:

def load_desi_coadd(filename, sel_rows=np.ones(500, dtype=bool)):
    """
    EDM: Copy and adapted from quasarnp.io

    Loads a DESI coadd file for process by utils.process_preds().

    Parameters
    ----------
    filename : :class:'str'
        The fullpath and filename of the coadd file to be loaded.
    sel_rows : :class:'numpy.array'
        A boolean array of which rows to use in a coadd file. If
        none are specified, it will use all rows by default.

    Returns
    -------
    X-out : :class:'numpy.array'
        The coadd data across cameras stitched together for
        processing by utils.process_preds()
    :class:'numpy.array'
        Indices of fibers with non-zero weights.
    """
    cams = ["B", "R", "Z"]
    with fitsio.FITS(filename) as h:
        # Load each cam sequentially, then rebin and merge
        # We will be rebinning down to 443, which is the input size of QuasarNet
        nfibers = sel_rows.sum()
        X_out = np.zeros((nfibers, 443))
        # ivar_out is the weights out, i.e. the ivar, we use this for normalization
        ivar_out = np.zeros_like(X_out) # Use zeros_like so we only have to change one
        for c in cams:
            fluxname,ivarname,wname = f"{c}_FLUX", f"{c}_IVAR", f"{c}_WAVELENGTH"
            # Load the flux and ivar
            flux = h[fluxname].read()[sel_rows, :]
            ivar = h[ivarname].read()[sel_rows, :]
            w_grid = h[wname].read()

            # Rebin the flux and ivar
            new_flux, new_ivar = rebin(flux, ivar, w_grid)

            X_out += new_flux
            ivar_out += new_ivar

    non_zero = ivar_out != 0
    X_out[non_zero] /= ivar_out[non_zero]

    nonzero_weights = np.sum(ivar_out, axis=1) != 0
    #print(f"{nfibers - np.sum(nonzero_weights)} spectra with zero weights")
    X_out = X_out[nonzero_weights]
    ivar_out = ivar_out[nonzero_weights]

    # axis=1 corresponds to the rebinned spectral axis
    # Finding the weighted mean both for normalization and for the rms
    mean = np.average(X_out, axis=1, weights=ivar_out)[:, None]
    rms = np.sqrt(np.average((X_out - mean) ** 2 ,axis=1, weights=ivar_out))[:, None]

    # Normalize by subtracting the weighted mean and dividing by the rms
    # as prescribed in the original QuasarNet paper.
    X_out = (X_out - mean) / rms
    return X_out, np.where(nonzero_weights)[0]
dkirkby commented 3 years ago

The simplest fix is probably to replace rows == None with rows is None. Does that work for you @echaussidon?

echaussidon commented 3 years ago

@dkirkby I didn't even think of that, it naturally works and I get the same result than for "my solution" ! Let me now when you push the correction !

Thank you, Edmond

dylanagreen commented 3 years ago

@echaussidon Thanks for noticing this! I just pushed David's suggested fix.