Model compression + chunked evaluation

We can compress models in the following way:

[ ] For point and Gaussian models, the spectrum can be computed on the fly, so only coefficients are needed.
[ ] For FITS models, we can store pointers to the image responsible for the channel.

This means we should enforce this at the model construction level, thus using DI we should abstract the predict to be derivable from the model level. Is this a big rewrite?

For chunked evaluation, we first point out that data can already be sharded over devices according to frequency. Local shards are thus meant to be handled by the local device. Then chunked processing should use scan(vmap) to processing chunks sequentially. According to #72 we know this is better than vmap(scan).

Joshuaalbert / DSA2000-Cal

Model compression + chunked evaluation #73