UCBerkeleySETI / rawspec

6 stars 6 forks source link

Need optimisation for chunk dimensions #55

Open texadactyl opened 2 years ago

texadactyl commented 2 years ago

The current chunk strategy is not working optimally for turbo_seti.

texadactyl commented 2 years ago

Today, this is the blimpy strategy one can see in waterfall.py, as used by fil2h5: image

texadactyl commented 2 years ago

Initially, rawspec v3 has been using uniformly: (1, 1, number_of_fine_channels). This causes degradation in readers such as turbo_seti.

This was amended to be (Nds, 1, Nfpc) where Ndsis the number of spectra per dump and Nfpc is the number of fine channels per coarse channel.

texadactyl commented 2 years ago

@david-macmahon from Seti BL slack:

Change the chunk size logic such that

Only concern is that for nfpc=2^20 and Nc=64, we will have 1024 "active" chunks for the high frequency resolution product. That might not play well with the chunk cache, so performance testing this idea would be critical. I think it would make the read side (E.g. turbo_seti) very happy.


Agreed. Much better, IMO. Getting started on work toewards tag v3.1.2. The only affected module is fbh5_open.c.

cc: @lacker @mattlebofsky

texadactyl commented 2 years ago

fbh5_open.c state

Control chunking with int USE_BLIMPY = 0; // 1 : use blimpy's algorithm; 0 : don't do that

Control caching with int CACHING_TYPE = 1; // 0 : no caching ; 1 : computed caching specifications; 2 : default caching

chunking logic in fbh5_open.c:

     cdims[0] = Nd;                         // number of spectra per dump
     cdims[1] = 1;
     cdims[2] = p_fb_hdr->nfpc;     // number of fine channels per coarse channel

computed caching in force:

    fcache_nslots = (cdims[0] * cdims[2]) + 1;
    fcache_nbytes = (Nd * p_fbh5_ctx->tint_size) + 1;   // tint_size byte size of one spectra