This will serve as a roadmap for the implementation of the "chunking" approach, which will eventually become a major mode of operation for Starfish. The rationale for doing this is that currently, fitting of spectral model grids (for main sequence stars at high resolution, cool stars at high and low resolution, and exoplanet spectra in general) is generally a systematics-dominated problem. This means that there are wavelength regions of the data spectrum for which the model grids cannot produce an accurate model. This also implies that if our main goal is inference of accurate stellar parameters, then a main focus will need to be a "calibration" of these systematic effects. Note that in some sense this roadmap supersedes that of #58, although the ideas in that roadmap are mostly complementary to those presented here.

Instead of fitting the full spectrum and downweighting discrepant regions (as in "classic" Starfish), we propose to fit individual chunks of the spectrum at a time. Chunking allows us to compare smaller regions of spectra to models and identify more easily where models are inaccurate.

The chunking approach will function by segmenting the spectrum into independent regions, where spectral inference to determine the fundamental stellar properties (Teff, log g, [Fe/H], etc...) are done on each chunk, independently. In an obvious sense, this violates much of what we know about stellar astrophysics, i.e., the emergent spectrum is the realization of complex stellar astrophysics and each spectral line is by no mean physically independent from the others. However, since we are dealing with strong model systematics (e.g., some spectral lines simply do not fit the data for any combination of Teff, log g, [Fe/H]), this approach allows us to get a better lay of the land, and provides a groundwork for exploring which regions of the spectrum we can trust and which ones we should be skeptical of.

There are a few tasks that need to be addressed in order to implement this approach.

Setup and initialization

First, the user should be able to take a model grid, a data spectrum, and a list of chunk wavelength boundaries, and then run some scripts to segment the data up into individual chunks. The idea is that the inference on each chunk can be done completely independently from any other chunk, and so the scripts should be organized to run with that in mind. Once the posterior for each chunk is delivered, however, we will want tools that can pull the posteriors from each directory and plot them.

[ ] Given a user input of wavelength chunks, automatically create output subdirectories, labeled by chunk ID and wavelength boundaries
[ ] segment the data and the model grids to a reasonable wavelength range (maybe +10% extra on either side of the edges), and effective temperature range (e.g., 2000K - 4000K) and copy to the directory in an appropriate HDF5 format.

There are a few considerations to take care of here. The individual chunk directories will be labeled by chunkID_wlstart_wlend appropriately zero-padded so that there are no conflicts when using typically sized chunks from optical to infrared wavelengths.

Also, we need an easy way to regenerate the sub-directories if the chunk wavelength boundaries change. There should also be a way to select and individual sub-directory and regenerate just that. For these reasons, we are thinking that an individual Makefile within each subdirectory might be the best option.

Tasks within each sub-directory

[ ] Set up an emulator to run on this chunk of the model (e.g., 2000K - 4000K, 6900AA - 7000AA)
[ ] Launch star_chunk.py to sample on this individual chunk, creating samples of a mini-posterior.

Are there any necessary changes that need to be made for the emulator? Currently nothing major comes to mind, but I could be forgetting something.

Necessary improvements to `star_chunk.py`

[ ] Implement/improve the ability to use user-defined priors (#32). Currently this means implementing more robust error checking
[ ] Implementing the ability to fix parameters in a general framework will help adapt this framework to be multiple use cases (a possible solution here: https://github.com/iancze/PSOAP/blob/master/psoap/utils.py)

Note that these mini-tasks can also be launched en mass by a top-level bash script.

Inference

[ ] Read in "mini-posteriors" for individual chunks and then plot them.

Starfish-develop / Starfish

Roadmap for Chunked Inference Approach #74

Setup and initialization

Tasks within each sub-directory

Necessary improvements to `star_chunk.py`

Inference

Starfish-develop / Starfish

Roadmap for Chunked Inference Approach #74

Setup and initialization

Tasks within each sub-directory

Necessary improvements to star_chunk.py

Inference

Necessary improvements to `star_chunk.py`