Refactor RRTMGP to compute fluxes over "packed" CRM data

Re-implement RRTMGP for configurations using MMF to allow for "packing" CRM data before calling flux solvers, rather than looping over columns and calling solvers for each CRM column. This should allow for better use of the GPUs once the radiation driver is ported using OpenACC or OpenMP, since before we would be launching crm_nx_rad * crm_ny_rad times more kernels. This does make for much larger working arrays within the radiation code, and I've noticed an approximate 2x slowdown on CPUs compared with a analogous implementation that implements the "looping" strategy (see branch brhillman/atm/add-sp-rrtmgp-loop).

E3SM-Project / ACME-ECP

Refactor RRTMGP to compute fluxes over "packed" CRM data #101