Closed sschmidt23 closed 6 months ago
It looks like the normalization is not being tracked properly with multiple chunks, the ancillary data overwrites the normalization each time so that only the first set of M (for M chunks of chunk_data) has values, and the rest are all zeros. Not sure if this is the only problem, but it's at least one problem.
Actually, it may be a problem in how the ancillary data is being added to the partial ensembles, as it looks like the normalization is being computed in each process_chunk
chunk, and those ensemble data are being added, but the normalization ancil only writes out the ancillary data for that chunk (which makes sense, as each chunk only knows about itself).
Irene Moskowitz messaged me pointing out that NZDir was failing for a dataset that she was attempting to run, returning a qp ensemble with
NaN
for every entry. I ran the demo notebook and it ran fine for the default data, but failed in the way Irene described when I used a larger dataset of my own. I noticed that the demo notebook has three samples all smaller than the defaultchunk_size
of 10,000, if I setchunk_size=1000
then the demo notebook fails with the error:So, it appears that there is a bug somewhere in code, likely in the
join_histogram
function that merges the chunks in the end.