NZDir fails if data is larger than chunk_size

sschmidt23 commented 6 months ago

Irene Moskowitz messaged me pointing out that NZDir was failing for a dataset that she was attempting to run, returning a qp ensemble with NaN for every entry. I ran the demo notebook and it ran fine for the default data, but failed in the way Irene described when I used a larger dataset of my own. I noticed that the demo notebook has three samples all smaller than the default chunk_size of 10,000, if I set chunk_size=1000 then the demo notebook fails with the error:

/Users/sam/anaconda3/envs/xtpz/lib/python3.10/site-packages/qp/hist_pdf.py:80: RuntimeWarning: invalid value encountered in divide
  self._hpdfs = (pdfs_2d.T / sums).T

So, it appears that there is a bug somewhere in code, likely in the join_histogram function that merges the chunks in the end.

[x] I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
[x] I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a descriprion of what I expected instead.
[ ] If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

sschmidt23 commented 6 months ago

It looks like the normalization is not being tracked properly with multiple chunks, the ancillary data overwrites the normalization each time so that only the first set of M (for M chunks of chunk_data) has values, and the rest are all zeros. Not sure if this is the only problem, but it's at least one problem.

sschmidt23 commented 6 months ago

Actually, it may be a problem in how the ancillary data is being added to the partial ensembles, as it looks like the normalization is being computed in each process_chunk chunk, and those ensemble data are being added, but the normalization ancil only writes out the ancillary data for that chunk (which makes sense, as each chunk only knows about itself).

LSSTDESC / rail_sklearn

NZDir fails if data is larger than chunk_size #11