Data reduction pipeline for Mazinlab MKID instruments - see also The MKID Pipeline paper for further detail.
Start by cloning the github repository:
git clone https://github.com/mazinlab/mkidpipeline.git
Create and enter an src directory:
mkdir src
cd src
Then install the mkidpipeline. You can do this one of two ways, in preferred order:
pipx install -e ./MKIDPipeline
is an editable install and references the files in ./MKIDPipeline instead of copying them. Changes made in the repo will automatically reflect in your local version, except for changes to the pyproject.toml If there are changes to the pyproject.toml, execute: pipx upgrade mkidpipeline
to reflect them in your local version
**This method does not work on dark or glados as of 11/24, it is only available on wheatley or your local server
pipx install git+https://github.com/mazinlab/mkidpipeline.git
or pipx install ./MKIDPipeline
clones the repo seperately from your local clone. For any changes made on the github, including pyproject.toml, you will need to execute: pipx upgrade mkidpipeline
Now you should see a the subdirectory /mkidpipeline with associated files in your src directory.
Navigate back to your src directory and execute the following:
git clone https://github.com/mazinlab/mkidcore.git
PDM is a development tool used to manage dependencies and run tests. If you are planning on making changes to the pipeline and/or want to run debugging tests, you will need to install pdm. These commands may take a few minutes.
cd src/mkidpipeline
pdm install --dev
cd src/mkidcore
pdm install -dev
pdm add mkidcore
**If you are having issues with pdm, try to pip uninstall and reinstall. If pdm is installed and it is not finding the command, try:
python -m pdm install --dev
instead of pdm install --dev
The python -m
pre-fix ensures you're using the correct environment. You can also troubleshoot by doublechecking the filepath to your version of python which python
and installation of pdm 'python -m pip show pdm' match.
Create a working directory and execute:
mkidpipe --init MEC
or mkidpipe --init xkid
depending on the instrument data you are using.
This will create three YAML config files (NB "_default" will be appended if the file already exists):
pipe.yaml
- The pipeline global configuration file.data.yaml
- A sample dataset. You'll need to redefine this with your actual data.out.yaml
- A sample output configuration. You'll need to redefine this as well.Each of these files contains extensive comments and irrelevant settings (or those for which the defaults are fine) may
be omitted. More details for pipe.yaml
are in mkidpipeline.pipeline
and mkidpipeline.steps.<stepname>
. More details for
the other two are in mkidpipeline.config
. Data and output yaml files can be vetted with helpful errors by running
mkidpipe --vet
in the directory containing your three files.
To build and reduce this dataset open the pipe.yaml
and make sure you are happy with the default paths
, these should be
sensible if you are working on GLADoS. On dark you'll want to change the darkdata
folder to data
. If the various
output paths don't exist they will be created, though permissions issues could cause unexpected results. Using a shared
database location might save you some time and is strongly encouraged at least across all of your pipeline runs
(consider collaborating even with other users)! Outputs will be placed into a generated directory structure under
out
and WILL clobber existing files with the same name.
The flow
section of the pipe.yaml
(see also below) lists all of the pipeline steps that will be executed when doing
the reduction. Here you may comment out or delete all steps you do not wish to run. For example to run all steps except
the cosmiccal and speccal, the flow will look like this:
flow:
- buildhdf
- attachmeta
- wavecal
#- cosmiccal
- pixcal
- flatcal
- wcscal
#- speccal
Note that buildhdf
, attachmeta
, and wavecal
need to be run for all reductions or else you will run into unexpected
behavior.
To generate all necessary directories as specified in the paths
section of the pipe.yaml
, run
mkidpipe --make-dir
NOTE: The default values for these paths
will need to be changed to point to the appropriate location for your machine.
To run the full calibration pipeline and generate specified outputs, use
mkidpipe --make-outputs
in the directory containing the three yaml files.
See mkidpipe --help
for more options, including how to run a single step or specify yaml files in different directories.
After a while (~TODO hours with the defaults) you should have some outputs to look at. To really get going you'll now
need to use observing logs to figure out what your data.yaml
and out.yaml
should contain for the particular data set
you want to look at. Look for good seeing conditions and note the times of the nearest laser cals.
When run, the pipeline goes through several steps, some only as needed, and only as needed for the requested outputs,
so it won't slow you down to have all your data defined in one place (i.e. you do not need multiple data.yaml
files
for different targets in a given night, they can all go in the same file). The steps are each described briefly in the
list below with practical notes given for each step following.
mkidpipeline.photontable.Photontable
) files for the defined outputs are created as needed by
mkidpipeline.steps.buildhdf
and are saved to the out
path in the form of HDF5 (.h5) files.mkidpipeline.pipeline.batch_apply_metadata
.mkidpipeline.steps.wavecal.fetch
. There is some
intelligence here so if the global config for the step or the start/stop times of the data is the same then the
solution will not be regenerated. mkidpipeline.steps.wavecal.apply
.mkidpipeline.steps.pixcal.apply
.mkidpipeline.steps.lincal.apply
. Note that the maximum correction is <<1% for MECs maximum count rate and this step takes > 1 hour. It is recommended to omit this step.mkidpipeline.steps.cosmiccal.apply
mkidpipeline.steps.flatcal.fetch
. There is some intelligence here so if the global config for the step or the start/stop times of the data the solution will be regnerated. mkidpipeline.steps.flatcal.apply
.mkidpipeline.steps.wcscal.fetch
, though at present this is a no-op as that code has not been written. All WCS solutions much thus be defined using the wcscal outputs of platescale and so forth (see the example).mkidpipeline.steps.output.generate
). mkidpipeline.steps.speccal.fetch
).Takes in a series of laser exposures as inputs with optional dark exposures to remove unwanted backgrounds.
Takes a long time to run and is a sub-percent level effect on typical data set. Not recommended for standard reductions.
Requires no additional calibration data. Parameters defined entirely in the pipe.yaml
and defaults are typically
sufficient in most use cases.
Flat fields are based on either whitelight (e.g. a classical quartz/dome/sky flat) observations or on a set of laser observations used for a wavecal. The whitelight codepath is operational but has not seen appreciable use as MEC flats are generally laser-based.
Some general notes: MEC observations are always taken with the derotator off (pupil tracking mode) and the telescope target generally w/ a coronagraph. MEC observations sometimes are taken for ADI.
Use Cases
TODO ADD some sort of parallactic angle offset support to build_wcs and teach get_wcs about it. rip out single pa time then make drizzler compute offsets for each timestep. derotate becomes true always, adi mode turns on computation of per bin PA offsets
The end result of this is: outputs gets an adi mode setting derotate defaults to true align start pa vanishes wcs_timestep leaves the config files and takes its default of the nonblurring value
Currently not implemented. Converts pixel counts into a flux density.
Instead of running the pipeline from the command line, MKID data can be manipulated directly from a python shell.
Generally shell operation would consist of something to the effect of
import mkidpipeline.definitions as definitions
import pkg_resources as pkg
import mkidpipeline.config as config
import mkidpipeline.pipeline as pipe
import mkidcore as core
from mkidpipeline.photontable import Photontable
from mkidcore.corelog import getLogger
#set up logging
lcfg = pkg.resource_filename('mkidpipeline', './utils/logging.yaml')
getLogger('mkidcore', setup=True, logfile=f'mylog.log', configfile=lcfg).setLevel('WARNING')
getLogger('mkidpipeline').setLevel('WARNING')
#To access photon tables directly
pt = Photontable('/path/to/h5_file.h5')
# To load and manipulate the full MKID data sets as defined in the YAMLS
config.configure_pipeline('pipe.yaml')
o = definitions.MKIDOutputCollection('out.yaml', datafile='data.yaml')
print(o.validation_summary())
# ... then playing around