LSSTDESC / NaMaster

A unified pseudo-Cl framework
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

Weird issue when caching and reloading workspaces #210

Open ajouellette opened 1 week ago

ajouellette commented 1 week ago

Hi,

I'm running into a very weird issue that I'm having a lot of trouble tracking down the origins of.

I have two functions for caching workspaces like this

def get_workspace(nmt_field1, nmt_field2, nmt_bins, wksp_cache):
    # only need to hash based on masks
    hash_key = joblib.hash([nmt_field1.get_mask(), nmt_field2.get_mask()])
    wksp_file = f"{wksp_cache}/cl/{hash_key}.fits"
    try:
        wksp = nmt.NmtWorkspace.from_file(wksp_file)
        wksp.check_unbinned()
        print("Using cached workspace")
        # update bins and beams after loading
        wksp.update_beams(nmt_field1.beam, nmt_field2.beam)
        wksp.update_bins(nmt_bins)
    except RuntimeError:
        # compute and save to file
        wksp = nmt.NmtWorkspace.from_fields(nmt_field1, nmt_field2, nmt_bins)
        os.makedirs(f"{wksp_cache}/cl", exist_ok=True)
        wksp.write_to(wksp_file)

    return wksp

def get_cov_workspace(nmt_field1a, nmt_field2a, nmt_field1b, nmt_field2b, wksp_cache):
    hash_key = joblib.hash([nmt_field1a.get_mask(), nmt_field2a.get_mask(), nmt_field1b.get_mask(), nmt_field2b.get_mask()])
    wksp_file = f"{wksp_cache}/cov/{hash_key}.fits"
    try:
        wksp = nmt.NmtCovarianceWorkspace.from_file(wksp_file)
        print("Using cached workspace")
    except RuntimeError:
        wksp = nmt.NmtCovarianceWorkspace.from_fields(nmt_field1a, nmt_field2a, nmt_field1b, nmt_field2b)
        os.makedirs(f"{wksp_cache}/cov", exist_ok=True)
        wksp.write_to(wksp_file)

    return wksp

The problem is that when I run my pipeline calculating all of the cross-spectra and covariances between a set of fields it works fine without caching, but fails with caching. Sometimes the code crashes after loading the Cl workspace with a GSL error saying that a matrix is singular. Sometimes the code runs without crashing but produces nonsensical results for the Cls and covariance matrices with NaNs. In the second case, I've noticed that cov_wksp.wsp.lmax_mask often has weird values, for example 4 times large than 3*nside-1.

Also, potentially relevant, I have not seen this issue previously when running similar code (but also older version of namaster) on spin-0 only fields, while the current issues are happening when cross-correlating spin-0 and spin-2 fields.

I'm not sure how to go about debugging this and also have not managed to make a generic example of this bug. Does anyone have any ideas?

damonge commented 1 week ago

Hi @ajouellette . Perhaps the hash should include also the spins of the fields involved (since the mode-coupling coefficients also depend on those). Let me know if that works.

ajouellette commented 1 week ago

That's definitely a good point, but I don't think it matters here since in this case I'm only computing spin-0 - spin-2 cross-spectra.

I managed to get a working example of this bug:

First, I generate mock correlated spin-0 and spin-2 fields with

nside = 1024
ell = np.arange(3*nside)

def gen_sim(cl_kk, cl_gg, cl_kg):
    klm, Elm, Blm = hp.synalm([cl_kk, cl_gg, 0*cl_kg, cl_kg, 0*cl_kg, 0*cl_kg],
                              lmax=3*nside-1, new=True)
    Q, U = hp.alm2map_spin([Elm, Blm], nside, 2, 3*nside-1)
    kappa = hp.alm2map(klm, nside)
    return kappa, Q, U

cl_kk = (1/(ell + 10) + 5e-5*ell) / 1e6
cl_gg = 5e-8/(ell + 10) + 5e-9
cl_gk = 1e-7/(ell + 100)

kappa, e1, e2 = gen_sim(cl_kk, cl_gg, cl_gk)

ra, dec = hp.pix2ang(nside, np.arange(hp.nside2npix(nside)), lonlat=True)
mask = ((ra < 60) + (ra > 300)) * (dec < -20) * (dec > -65)
mask = nmt.mask_apodization(mask, 2)

hp.write_map("test_kappa.fits", kappa)
hp.write_map("test_shear.fits", [e1, e2])
hp.write_map("test_mask.fits", mask)

Then I run this script to calculate the cross-spectrum with and without caching. As long as the workspaces are not loaded from disk, everything works fine, otherwise I get NaNs for both the cross-spectrum and covariance.

ajouellette commented 6 days ago

Some more things I've noticed: