LSSTDESC / SSim_DC1

Configuration, production, validation specifications and tools for the DC1 Data Set.
4 stars 2 forks source link

Ingest DC1 coadd catalogs into a MySQL database #30

Closed jchiang87 closed 6 years ago

jchiang87 commented 7 years ago

I've done this for the imsim undithered data, the Level 2 output of which is available at NERSC:

/global/cscratch1/sd/descdm/DC1/full_focalplane_undithered

For now, I've put the data in the DESC_Twinkles_Level_2 database on scidb1.nersc.gov. NERSC has provided a DESC_DC1_Level_2 database on nerscdb04.nersc.gov, but that db isn't configured to accept loading via csv files (which is faster by O(10) than insert commands), so I've used the Twinkles db until the DC1 db is reconfigured.

The schema for the Coadd_Object table is derived from the FITS tables in the coadd merged object catalogs, e.g., /global/cscratch1/sd/descdm/DC1/full_focalplane_undithered/deepCoadd-results/merged/0/10,10/ref-0-10,10.fits. I've added columns (as part of the primary key along with the id column) to contain the patch and projectId, respectively. The patch can serve as part of a rough spatial query, in addition to identifying the coadd image where an object can be found. projectId will be used to differentiate the Level 2 catalog results between the imsim undithered, imsim dithered, and phosim DC1 datasets.

Here is example code showing how to obtain query results as a pandas DataFrame:

import desc.pserv

#db_info = dict(host='nerscdb04.nersc.gov',                                      
#               database='DESC_DC1_Level_2')                                     
db_info = dict(host='scidb1.nersc.gov',
               database='DESC_Twinkles_Level_2')

connection = desc.pserv.DbConnection(**db_info)

query = '''select * from Coadd_Object co join Project pr on                      
           co.projectId=pr.projectId where co.patch="'10,10'" and                
           pr.projectName="DC1 imsim undithered" limit 10'''

#query = '''select * from Coadd_Object where patch="'10,10'" and projectId=0'''  

df = connection.get_pandas_data_frame(query)

The latter, commented-out query can be used to avoid the join with the Project table if you know the projectId. Instructions for setting up and using the pserv package are available at the pserv repo.

jchiang87 commented 7 years ago

The schema for the Project table is at the pserv repo.

jchiang87 commented 7 years ago

The imsim undithered coadd catalogs have been ingested into the DESC_DC1_Level_2 database:

MySQL [DESC_DC1_Level_2]> select count(*) from Coadd_Object;
+----------+
| count(*) |
+----------+
| 11288832 |
+----------+
1 row in set (1 min 14.25 sec)

MySQL [DESC_DC1_Level_2]> 

I will drop the corresponding table in the DESC_Twinkles_Level_2 db.

jchiang87 commented 7 years ago

I've ingested the imsim dithered coadd catalogs from

/global/cscratch1/sd/descdm/DC1/DC1-imsim-dithered

at NERSC into the DESC_DC1_Level_2 database:

MySQL [DESC_DC1_Level_2]> select * from Project;
+-----------+----------------------+
| projectId | projectName          |
+-----------+----------------------+
|         0 | DC1 imsim undithered |
|         1 | DC1 imsim dithered   |
+-----------+----------------------+
2 rows in set (0.01 sec)
MySQL [DESC_DC1_Level_2]> select count(*) from Coadd_Object where projectId=1;
+----------+
| count(*) |
+----------+
| 12311932 |
+----------+
1 row in set (3 min 5.85 sec)
jchiang87 commented 7 years ago

The phosim coadd catalog loading has finished.

cwwalter commented 7 years ago

After the imSim background jobs finish we should re-ingest with analysis flags also decoded/ingested.

cwwalter commented 7 years ago

@jchiang87 Are we ready to re-ingist the corrected dithered and undithered imSim runs? I sent you a notebook last week with several ways to get at the flags. Did you get that, or are you already clear anyway on what we need to do?

cwwalter commented 6 years ago

@jchiang87 I think we can close this now correct? Also, since you have mad the new versions with the flags etc, I think you put them in the new project area at NERSC correct? So, I think I can delete mine in the old project area but I just wanted to check.

jchiang87 commented 6 years ago

yes, sure we can close this.