dirac-institute / trailblazer

An open data repository for astronomical data products affected by satellites
MIT License
8 stars 2 forks source link

Standardize header metadata. #10

Closed DinoBektesevic closed 3 years ago

DinoBektesevic commented 3 years ago

This is a first attempt at taking the various varied FITS headers and standardizing a select number of header keywords that we can use in our header metadata DB table.

The header keywords we standardize on here will set our table schema and our query tool interface but there are some outstanding issues with standardizing the WCSs. If we can not easily separate WCS data into a standardized set of columns querying that table will be much harder but it should still be possible to save a subset of keywords (center pixel values and coordinates) and the whole WCS as a pickled blob.

This would not be optimal, as optimal as storing all of these values separately, I suspect. Ergo, this draft PR to give some insight into what the difficulties are.

mrawls commented 3 years ago

Metadata still missing: band or filter, exposure duration (grab independently instead of doing end - start if feasible), some kind of processed/reduced flag (probably either "yes" or "unknown")

DinoBektesevic commented 3 years ago

Metadata still missing: band or filter, exposure duration (grab independently instead of doing end - start if feasible), some kind of processed/reduced flag (probably either "yes" or "unknown")

Added filter and exposure time for astro_metadata_translator recognized instruments.

Added test dataset to get an overview of where functionality is right now. Coverage right now is about 50/50 of the dataset you compiled. Images in the dataset have been cropped for size but the WCS values have been shifted to maintain positional accuracy (approximately).

I started gnawing at getting more instruments supported today but nothing that I would like to commit quite yet. I think our discussion was on-point and that whether we want or no we will have to create a HeaderStandardizer and a ImageStandardizer classes that do the atomic work on a single header and image and then see if FitsProcessor can be made to support both single and multi-extension fits files. If not a bit of a redesign of the FitsProcessor into a SingleExtensionFitsProcessor and a MultiExtensionFitsProcessor might be needed so I don't want to promote this into a full PR quite yet.

DinoBektesevic commented 3 years ago

Ok, this is now getting much better. I think someone may take a look at what is in this code and not feel offended.

Added header standardizer classes, essentially maps between what we want in our DB and what we can find in particular instrument's header

Processors are essentially recipes of how particular file should be processed. For example, a compressed archive needs to be uncompressed, "unarchived" then each file needs to be processed; but is this particular file Rubin calexp - then the image is only in 1st HDU - or is it a DECam NOAO image - then all HDUs have headers and data - is it an SDSS....

To do:

DinoBektesevic commented 3 years ago

Ok, fixed the last comments and added tests for Models.

The remainder of issues here were punted other issues in which we will tackle them. Mainly all is well except: we don't have good error handling, data update/replace/remove logic and MOA-II WCS is a placeholder pending error handling, or getting Astrometry.net solver working.

Looking for any last comments?

mrawls commented 3 years ago

Understood. Let's get it merged so we can find fun new ways to break things 😄