Starfish-develop / Starfish

Tools for Flexible Spectroscopic Inference
https://starfish.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
70 stars 22 forks source link

Grid Interface Conformity and Caching #124

Closed mileslucas closed 5 years ago

mileslucas commented 5 years ago

I'd like to point out some inconsistencies in the various interfaces and would also like to propose an enhancement in the way that the interfaces are used.

So I think we need to partially rewrite the interfaces so that they can be used more easily from other parts of the code. We should be able to call any subclass of GridInterface and expect the same results. The biggest divergence from that goal is that the HDF5Interface is considerably different from the RawGridInterface but serves the same base purpose: to interface with the model libraries.

I propose we rewrite the HDF5Interface to inherit from RawGridInterface, which will actually require some rewriting of RawGridInterface itself. If there are any ideas on what exactly we want from an interface, I would love to hear them.

My enhancement proposal is to use caching and instead of generally using Interface.load_flux() to rather use a __call__ method. This may not have any real performance impacts in the sampling code but is a code idea for people who only use the grid tools. This would match the behavior of the Interpolator class. If you think this is a good idea, I can implement it, otherwise, we can maintain the load_flux() style without any __call__ methods.

iancze commented 5 years ago

In general, I think this sounds like a good idea. The fewer variety of interfaces that users have to contend with, the better.

I guess the main design decision is how to properly respect the fact that HDF5Interface and RawGridInterface (and PHOENIX, Kurucz, etc...) are interfacing to different products. The main thing they will have in common is the type and spacing between fundamental stellar parameters. But most other things will be different (wl range, range of stellar parameters, and of course resolution of the spectra (presuming that a broadening and resampling operation occurred)). I like the idea of the simplified interface for loading flux. It may be trickier to keep things consistent enough if you plan on implementing the transforms in #64 as well, but maybe there is a clean way of doing it.

Caching for these products may help some users, but given the use cases we've outlined it might just be an opportunity for bugs and complexity (given the transforms) without too much clear benefit.

mileslucas commented 5 years ago

Some of my recent pushes to refactor/core-numpy have cleaned up some of the inconsistencies and allow a more seamless flow between grid interfaces, hdf5 interfaces, and emulators. Notably items like parameter names can get passed up all the way to SpectrumModels, which means we can have a parameter dictionary with meaningful names rather than 'grid_param:0' or similar.

I've also removed config.yaml and all its subsidaries, which has had 0 impact on the grid interfaces. The biggest difference is that you have to explicitly write out the path for the interfaces (rather than reading from the config).

iancze commented 5 years ago

This sounds great!