STScI-Citizen-Science / MTPipeline

Pipeline to produce CR rejected, astrodrizzled, png's of HST WFPC2 solar system data.
6 stars 1 forks source link

master_images table update and migration #148

Closed acviana closed 2 years ago

acviana commented 10 years ago

This ticket is for the work that needs to be done to the master_images table for the v1.0.0 release. The 3 areas we need to address are:

acviana commented 10 years ago

Let's start by looking at the schema. What need to be removed? What needs to be added? As we discussed last week:

Here is the current schema:

$ describe master_images;
+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| id               | int(11)      | NO   | PRI | NULL    | auto_increment |
| project_id       | int(11)      | YES  |     | NULL    |                |
| name             | varchar(50)  | YES  | UNI | NULL    |                |
| fits_file        | varchar(50)  | YES  | UNI | NULL    |                |
| object_name      | varchar(50)  | YES  |     | NULL    |                |
| set_id           | int(11)      | YES  |     | NULL    |                |
| set_index        | int(11)      | YES  |     | NULL    |                |
| width            | int(11)      | YES  |     | NULL    |                |
| height           | int(11)      | YES  |     | NULL    |                |
| minimum_ra       | double       | YES  |     | NULL    |                |
| minimum_dec      | double       | YES  |     | NULL    |                |
| maximum_ra       | double       | YES  |     | NULL    |                |
| maximum_dec      | double       | YES  |     | NULL    |                |
| pixel_resolution | float        | YES  |     | NULL    |                |
| priority         | int(11)      | YES  |     | NULL    |                |
| description      | varchar(50)  | YES  |     | NULL    |                |
| file_location    | varchar(100) | YES  |     | NULL    |                |
| visit            | varchar(11)  | YES  |     | NULL    |                |
| orbit            | int(11)      | YES  |     | NULL    |                |
| drz_mode         | varchar(6)   | YES  |     | NULL    |                |
| cr_mode          | varchar(6)   | YES  |     | NULL    |                |
+------------------+--------------+------+-----+---------+----------------+
21 rows in set (0.00 sec)
acviana commented 10 years ago

I think we should drop all the CQ specific fields. They don't anything to the MAST product. There are only 3 of these:

There are a couple of others that are just kinda useless and we should drop while we're cleaning house anyway:

Then there are some things that should be renamed and changed slightly: -object_name should be changed to jpl_object and refer to the name of the object in the HORIZON system.

Then there is bunch of stuff to add:

walyssonBarbosa commented 10 years ago

Changes to master_images:

Dropped:

I decided to leave file_location because this column is used in build_master_finders_table.py. I need to find a solution to find the file location before deleting this column.

Renamed object_name as jpl_object and fits_file as root name.

Added version. What type should this column be?

walyssonBarbosa commented 10 years ago

I didn't understand this part:

Should 3 (e.g. fits_type, png_type and ds9_region) columns be created?

asymone commented 10 years ago

Here are the suggested names for the FITS and PNG files. There are two sets let me know which one you prefer:

The base flt.fits or c0m.fits file.

From these last two we get 4 PNG files that need field names:

walyssonBarbosa commented 10 years ago

These are the ones I prefer:

walyssonBarbosa commented 10 years ago

But what columns show I add to master_images to store these fields?

Here is the current schema for master_images:

+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| id               | int(11)      | NO   | PRI | NULL    | auto_increment |
| project_id       | int(11)      | YES  |     | NULL    |                |
| name             | varchar(50)  | YES  | UNI | NULL    |                |
| rootname         | varchar(50)  | YES  | UNI | NULL    |                |
| jpl_object       | varchar(50)  | YES  |     | NULL    |                |
| width            | int(11)      | YES  |     | NULL    |                |
| height           | int(11)      | YES  |     | NULL    |                |
| minimum_ra       | double       | YES  |     | NULL    |                |
| minimum_dec      | double       | YES  |     | NULL    |                |
| maximum_ra       | double       | YES  |     | NULL    |                |
| maximum_dec      | double       | YES  |     | NULL    |                |
| pixel_resolution | float        | YES  |     | NULL    |                |
| file_location    | varchar(100) | YES  |     | NULL    |                |
| visit            | varchar(11)  | YES  |     | NULL    |                |
| orbit            | int(11)      | YES  |     | NULL    |                |
| drz_mode         | varchar(6)   | YES  |     | NULL    |                |
| cr_mode          | varchar(6)   | YES  |     | NULL    |                |
| version          | varchar(11)  | YES  |     | NULL    |                |
+------------------+--------------+------+-----+---------+----------------+
walyssonBarbosa commented 10 years ago
+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| id               | int(11)      | NO   | PRI | NULL    | auto_increment |
| project_id       | int(11)      | YES  |     | NULL    |                |
| name             | varchar(50)  | YES  | UNI | NULL    |                |
| rootname         | varchar(50)  | YES  | UNI | NULL    |                |
| jpl_object       | varchar(50)  | YES  |     | NULL    |                |
| width            | int(11)      | YES  |     | NULL    |                |
| height           | int(11)      | YES  |     | NULL    |                |
| minimum_ra       | double       | YES  |     | NULL    |                |
| minimum_dec      | double       | YES  |     | NULL    |                |
| maximum_ra       | double       | YES  |     | NULL    |                |
| maximum_dec      | double       | YES  |     | NULL    |                |
| pixel_resolution | float        | YES  |     | NULL    |                |
| file_location    | varchar(100) | YES  |     | NULL    |                |
| visit            | varchar(11)  | YES  |     | NULL    |                |
| orbit            | int(11)      | YES  |     | NULL    |                |
| fits_drz_mode    | varchar(6)   | YES  |     | NULL    |                |
| fits_cr_mode     | varchar(6)   | YES  |     | NULL    |                |
| png_drz_mode     | varchar(6)   | YES  |     | NULL    |                |
| png_cr_mode      | varchar(6)   | YES  |     | NULL    |                |
| ds               | varchar(6)   | YES  |     | NULL    |                |
| version          | varchar(11)  | YES  |     | NULL    |                |
+------------------+--------------+------+-----+---------+----------------+

Should the new schema be like this?

acviana commented 10 years ago

Here are the new fields that need to be added. They should all be sufficiently long varchar fields to allow the full file name without the path. They should all also have uniqueness constraints.

file name field name
rootname rootname
rootname_flt.fits flt
rootname_cr_flt.fits cr_flt
rootname_single_sci.fits single_sci
rootname_cr_single_sci.fits cr_single_sci
rootname_single_sci_linear.png single_sci_linear
rootname_cr_single_sci_linear.png cr_single_sci_linear
rootname_single_sci_log.png single_sci_log
rootname_cr_single_sci_log.png cr_single_sci_log
walyssonBarbosa commented 10 years ago

Ok. I added these fields in master_images, I only need to add this uniqueness constraint in their properties.

I used these commands to add them:

alter table master_images add flt varchar(50);
alter table master_images add cr_flt varchar(50);
alter table master_images add single_sci varchar(50);
alter table master_images add cr_single_sci varchar(50);
alter table master_images add single_sci_linear varchar(50);
alter table master_images add cr_single_sci_linear varchar(50);
alter table master_images add single_sci_log varchar(50);
alter table master_images add cr_single_sci_log varchar(50);

I added a name column in master_finders and populated the rows with the following command:

mysql> update master_finders, master_images set master_finders.name = master_images.name where master_finders.master_images_id = master_images.id;
Query OK, 727863 rows affected (11.16 sec)
Rows matched: 727863  Changed: 727863  Warnings: 0

Now I will start to rebuild the master_images table.

walyssonBarbosa commented 10 years ago

Before I made those changes I created a dump mtdump-2014-08-13.sql and copied to /astro/mtpipeline/database_dumps/.

walyssonBarbosa commented 10 years ago

I also dropped visit, orbit, cr_mode and drz_mode columns from master_images. And changed project_id to proposalid. This is the current schema for the table:

+----------------------+-------------+------+-----+---------+----------------+
| Field                | Type        | Null | Key | Default | Extra          |
+----------------------+-------------+------+-----+---------+----------------+
| id                   | int(11)     | NO   | PRI | NULL    | auto_increment |
| proposalid           | int(11)     | YES  |     | NULL    |                |
| name                 | varchar(50) | YES  | UNI | NULL    |                |
| rootname             | varchar(50) | YES  | UNI | NULL    |                |
| jpl_object           | varchar(50) | YES  |     | NULL    |                |
| width                | int(11)     | YES  |     | NULL    |                |
| height               | int(11)     | YES  |     | NULL    |                |
| minimum_ra           | double      | YES  |     | NULL    |                |
| minimum_dec          | double      | YES  |     | NULL    |                |
| maximum_ra           | double      | YES  |     | NULL    |                |
| maximum_dec          | double      | YES  |     | NULL    |                |
| pixel_resolution     | float       | YES  |     | NULL    |                |
| version              | varchar(11) | YES  |     | NULL    |                |
| flt                  | varchar(50) | YES  | UNI | NULL    |                |
| cr_flt               | varchar(50) | YES  | UNI | NULL    |                |
| single_sci           | varchar(50) | YES  | UNI | NULL    |                |
| cr_single_sci        | varchar(50) | YES  | UNI | NULL    |                |
| single_sci_linear    | varchar(50) | YES  | UNI | NULL    |                |
| cr_single_sci_linear | varchar(50) | YES  | UNI | NULL    |                |
| single_sci_log       | varchar(50) | YES  | UNI | NULL    |                |
| cr_single_sci_log    | varchar(50) | YES  | UNI | NULL    |                |
+----------------------+-------------+------+-----+---------+----------------+
walyssonBarbosa commented 10 years ago

I set master_images_id in master_finders as nullable and updated all rows as null so that I could rebuild master_images.

These are the commands I used:

alter table master_finders change master_images_id master_images_id int(11) null;
update master_finders set master_images_id = NULL;

Then I cleared the master_images table:

mysql> delete from master_images;
Query OK, 50888 rows affected (21.07 sec)
walyssonBarbosa commented 10 years ago

I changed the code and committed it. I tested it on /astro/mtpipeline/mtpipeline_outputs/wfpc2/11156_neptune/u9zy5503m_c0m.fits and it inserted a row in master_images.

I created a new db dump mtdump-2014-08-13_v2.sql and copied it to /astro/mtpipeline/database_dumps/.

asymone commented 10 years ago

After looking at the changes to database_interface.py, we've decided that we will build the output filenames a bit differently.

Example:

file = 'hlsp_mt_hst_wfpc2_{}_*_c0m.fits'.format(self.rootname)  
if os.exist(file):
      self.single_sci = file.replace('_c0m.fits', '_img.fits')
      self.cr_single_sci = file.replace('_c0m.fits', '_sci.fits')

I've pulled the update-master-finders branch and I'm currently working on these changes.