Add logic to avoid reallocating ZCOMBUF[RS] at each call

This is a significant optimisation of the CPU code path. Credit owed to @marsdeno.

TCO1279, 48-node benchmark (--norms --truncation 1279 --niter 100 --nlev 137 --nfld 1 --vordiv --uvders --scders -v):

develop:

Inverse-direct transforms
-------------------------
avg  (s):   0.4258
min  (s):   0.3726
max  (s):   1.2771
med  (s):   0.4168
loop (s):  50.9419

pre_allocated_buffers:

Inverse-direct transforms
-------------------------
avg  (s):   0.2227
min  (s):   0.1793
max  (s):   1.1176
med  (s):   0.2128
loop (s):  30.9310

Almost 2x speed-up of the median transform time with identical norms.

ecmwf-ifs / ectrans