STScI-Citizen-Science / MTPipeline

Pipeline to produce CR rejected, astrodrizzled, png's of HST WFPC2 solar system data.
6 stars 1 forks source link

Update Database for 4th Delivery #76

Closed acviana closed 10 years ago

acviana commented 10 years ago

The database needs to be updated for the 4th delivery. Specifically, the set_id and set_index need to be updated so that they distinguish between the 4 different image treatments instead of treating each treatment as an iteration of the set_index counter.

Additionally we're going to add a cr_mode field to match the drizzle_mode field.

mysql> SELECT id, project_id, name, set_id, set_index, visit, orbit FROM master_images LIMIT 20;
+----+------------+-----------------------------------------------+--------+-----------+-------+-------+
| id | project_id | name                                          | set_id | set_index | visit | orbit |
+----+------------+-----------------------------------------------+--------+-----------+-------+-------+
|  1 |       5215 | u2mi0101t_c0m_center_single_sci_linear.png    |      1 |         1 | 1     |    10 |
|  2 |       5215 | u2mi0101t_c0m_wide_single_sci_linear.png      |      1 |         2 | 1     |    10 |
|  3 |       5215 | u2mi0101t_cr_c0m_center_single_sci_linear.png |      1 |         3 | 1     |    10 |
|  4 |       5215 | u2mi0101t_cr_c0m_wide_single_sci_linear.png   |      1 |         4 | 1     |    10 |
|  5 |       5215 | u2mi0102t_c0m_center_single_sci_linear.png    |      2 |         1 | 1     |    20 |
|  6 |       5215 | u2mi0102t_c0m_wide_single_sci_linear.png      |      2 |         2 | 1     |    20 |
|  7 |       5215 | u2mi0102t_cr_c0m_center_single_sci_linear.png |      2 |         3 | 1     |    20 |
|  8 |       5215 | u2mi0102t_cr_c0m_wide_single_sci_linear.png   |      2 |         4 | 1     |    20 |
|  9 |       5215 | u2mi0103t_c0m_center_single_sci_linear.png    |      3 |         1 | 1     |    30 |
| 10 |       5215 | u2mi0103t_c0m_wide_single_sci_linear.png      |      3 |         2 | 1     |    30 |
| 11 |       5215 | u2mi0103t_cr_c0m_center_single_sci_linear.png |      3 |         3 | 1     |    30 |
| 12 |       5215 | u2mi0103t_cr_c0m_wide_single_sci_linear.png   |      3 |         4 | 1     |    30 |
| 13 |       5215 | u2mi0104t_c0m_center_single_sci_linear.png    |      4 |         1 | 1     |    40 |
| 14 |       5215 | u2mi0104t_c0m_wide_single_sci_linear.png      |      4 |         2 | 1     |    40 |
| 15 |       5215 | u2mi0104t_cr_c0m_center_single_sci_linear.png |      4 |         3 | 1     |    40 |
| 16 |       5215 | u2mi0104t_cr_c0m_wide_single_sci_linear.png   |      4 |         4 | 1     |    40 |
| 17 |       5215 | u2mi0105t_c0m_center_single_sci_linear.png    |      5 |         1 | 1     |    50 |
| 18 |       5215 | u2mi0105t_c0m_wide_single_sci_linear.png      |      5 |         2 | 1     |    50 |
| 19 |       5215 | u2mi0105t_cr_c0m_center_single_sci_linear.png |      5 |         3 | 1     |    50 |
| 20 |       5215 | u2mi0105t_cr_c0m_wide_single_sci_linear.png   |      5 |         4 | 1     |    50 |
+----+------------+-----------------------------------------------+--------+-----------+-------+-------+
20 rows in set (0.00 sec)
acviana commented 10 years ago

A each set_id is a combination of project_name, visit, orbit, drizzle_mode and the soon to be added cr_mode. The value of each new set_id is max(set_id) + 1. The set_index can be set alphabetically by name since this is ostensibly time-ordered.

acviana commented 10 years ago

Updating the Database

In the database:

mysql> ALTER TABLE master_images ADD cr_mode varchar(6) DEFAULT NULL;
Query OK, 0 rows affected (0.31 sec)
Records: 0  Duplicates: 0  Warnings: 0

In the MasterImages class in the database_interface.py module :

...
cr_mode = Column(String(6))
...
acviana commented 10 years ago

I updated the database with this code snippet in an IPython Notebook.

master_images_query = session.query(MasterImages.id, MasterImages.name).all()
for counter, record in enumerate(master_images_query):
    cr_mode = record.name.split('_')[1]
    if cr_mode == 'c0m':
        cr_mode = 'no_cr'
    assert cr_mode in ['no_cr', 'cr'], 'Unexpected CR mode.'
    session.query(MasterImages).\
            filter(MasterImages.id == record.id).\
            update({'cr_mode':cr_mode})
    if counter % 100 == 0:
        print 'Updated {:,} records'.format(counter)
        session.commit()
session.commit()           

I tested the results with this query:

mysql> SELECT * FROM master_images WHERE cr_mode IS NULL;
Empty set (0.01 sec)
acviana commented 10 years ago

This query returns the unique combinations of these fields.

mysql> SELECT DISTINCT project_id, visit, orbit, drz_mode, cr_mode FROM master_images;
...
9328 rows in set (0.02 sec)

This removes all the values we need to ignore for the time being:

mysql> SELECT DISTINCT project_id, visit, orbit, drz_mode, cr_mode 
    FROM master_images 
    WHERE drz_mode != 'slice' 
    AND drz_mode != 'zoom';
...
7956 rows in set (0.02 sec)
acviana commented 10 years ago

I made the updates with this code:

distinct_query = session.query(MasterImages.project_id, MasterImages.visit, 
                               MasterImages.orbit, MasterImages.drz_mode,
                               MasterImages.cr_mode).\
                         filter(MasterImages.drz_mode != 'slice').\
                         filter(MasterImages.drz_mode != 'zoom').\
                         distinct().\
                         all()

query_all = session.query(MasterImages.id, MasterImages.project_id, 
                          MasterImages.visit, MasterImages.orbit, 
                          MasterImages.drz_mode, MasterImages.cr_mode).all()

for record_counter, record in enumerate(query_all):
    key = (record.project_id, record.visit, record.orbit, 
           record.drz_mode, record.cr_mode)
    if set_dict.has_key(key):
        set_id = set_dict[key]['set_id']
        set_index = set_dict[key]['set_index']
        set_dict[key]['set_index'] += 1
    else:
        set_id = None
        set_index = None
    session.query(MasterImages).\
            filter(MasterImages.id == record.id).\
            update({'set_id':set_id, 'set_index':set_index})
    if record_counter % 100 == 0:
        session.commit()
        print 'Updated {:,} records'.format(record_counter)
session.commit()
acviana commented 10 years ago

This query verifies that things are working correctly:

mysql> SELECT id, name, project_id, visit, orbit, drz_mode, cr_mode, set_id, set_index 
    FROM master_images 
    WHERE set_id = 7876;
+-------+-----------------------------------------------+------------+-------+-------+----------+---------+--------+-----------+
| id    | name                                          | project_id | visit | orbit | drz_mode | cr_mode | set_id | set_index |
+-------+-----------------------------------------------+------------+-------+-------+----------+---------+--------+-----------+
| 12800 | ub96330cm_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         1 |
| 12856 | ub963305m_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         2 |
| 12860 | ub963307m_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         3 |
| 12864 | ub963309m_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         4 |
| 12956 | ub963304m_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         5 |
| 12960 | ub963306m_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         6 |
| 12964 | ub963308m_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         7 |
| 12968 | ub96330am_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         8 |
| 12972 | ub96330bm_cr_c0m_center_single_sci_linear.png |      11956 | 33    |     4 | center   | cr      |   7876 |         9 |
+-------+-----------------------------------------------+------------+-------+-------+----------+---------+--------+-----------+
9 rows in set (0.00 sec)
acviana commented 10 years ago

The new database dump has been posted to the FTP site as mtdump-2014-02-13.dump