LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Summary Truth catalogs for stars #386

Closed jchiang87 closed 3 years ago

jchiang87 commented 4 years ago

We will have variability truth tables for stars that report the model fluxes in each visit, but it would be useful to have a table that summarizes the variability properties for the stars as well. This issue will be used to gather input on the table columns to provide and to track the implementation and delivery.

jchiang87 commented 4 years ago

Checking the varParamStr column for all of the entries in our star db file, /global/projecta/projectdirs/lsst/groups/SSim/DC2/dc2_stellar_healpixel.db, it appears that there are 3 models for stellar variability that we are using, and entries where varParamStr == None, which I assume means a non-variable star. Here is the breakdown:

Model Number of entries
'MLT' 11296954
'kplr' 8366862
'applyRRly' 1196
'None' 604961
Total 20269973

The different models are implemented in https://github.com/lsst/sims_catUtils/blob/master/python/lsst/sims/catUtils/mixins/VariabilityMixin.py, including some we aren't using.

BrunoSanchez commented 4 years ago

This is useful. I was actually looking at the /global/cscratch1/sd/descim/star_truth/star_truth_summary.db and it has the following columns

CREATE TABLE column_descriptions
    (name text, description text, dtype text);
CREATE TABLE truth_summary
        (id TEXT, host_galaxy BIGINT, ra DOUBLE, dec DOUBLE,
        redshift FLOAT, is_variable INT, is_pointsource INT,
        flux_u FLOAT, flux_g FLOAT, flux_r FLOAT,
        flux_i FLOAT, flux_z FLOAT, flux_y FLOAT,
        flux_u_noMW FLOAT, flux_g_noMW FLOAT, flux_r_noMW FLOAT,
        flux_i_noMW FLOAT, flux_z_noMW FLOAT, flux_y_noMW FLOAT);

For the table you are referring to, it has the following schema:

                     simobjid int, htmid_6 int, ra real, decl real,
                     gal_l real, gal_b real, magNorm real,
                     mura real, mudecl real, parallax real,
                     ebv real, radialVelocity real, varParamStr text,
                     sedFilename text,
                     umag real, gmag real, rmag real, imag real,
                     zmag real, ymag real, hpid int);

What is the relationship between them? Are these linked by the stars.simobjid==truth_summary.id? Or I should not use the first table?

jchiang87 commented 4 years ago

What is the relationship between them?

The tables in star_truth_summary.db are derived from the info in dc2_stellar_healpixel.db

Are these linked by the stars.simobjid==truth_summary.id?

Yes, that's right. Unfortunately, the instance catalog (and centroid file) ids have a further encoding: uniqueId = stars.simobjid*1024 + 4, presumably to mirror the uniqueId construction for the separate galaxy components. We were planning to update the star_truth_summary.db file with the instance catalog ids, but if it is more useful to match ids in the dc2_stellar_healpixel.db file, it's probably better to leave the star_truth_summary.db file as-is. Comments welcome on this!

Or I should not use the first table?

I think it's ok to use either table.

jchiang87 commented 4 years ago

I've created an sqlite3 db table with the mean and standard deviations of the delta_mag values produced by the lsst_sims stellar variability code. The sqlite3 file at NERSC is /global/cscratch1/sd/jchiang8/desc/Run2.2i/stellar_variability/merged_star_db/star_lc_stats.db and it contains a table with this schema:

CREATE TABLE stellar_variability
              (id TEXT, model TEXT, mean_u, mean_g, mean_r,
               mean_i, mean_z, mean_y, stdev_u, stdev_g,
               stdev_r, stdev_i, stdev_z, stdev_y);

The id values are the same as in /global/cscratch1/sd/descim/star_truth/star_truth_summary.db. Here are hexbin plots of mean(delta_mag_i) (=mean_i) vs std(delta_mag_i) (=stdev_i) for each of the three non-constant models: stellar_variability_stats and randomly-selected example light curves for the kplr and applyRRly models: kplr_example_40965014666 applyRRly_example_694090 Here is the code to produce the stellar_variability table. @BrunoSanchez Let me know if this looks useful! Suggestions welcome.

jchiang87 commented 4 years ago

The catalog that contains the stellar parameters that we've been using for DC2, /global/projecta/projectdirs/lsst/groups/SSim/DC2/dc2_stellar_healpixel.db, covers a much larger area than the DC2 300 sq deg region. Here's a hexbin plot of the ra, decl values in that file: Run2_star_selection The dashed line is the DC2 boundary. Since our instance catalogs use a radius of 2.1 degrees, we generate data outside of the DC2 boundary by that amount, so I've defined the dotted region, whose boundary is at least 2.1 degrees outside of the DC2 region. There are 6883094 stars in that dotted region. For the catalogs I'll be pointing to later today, I've restricted the data to those objects.

BrunoSanchez commented 4 years ago

Thanks so much Jim. I just saw this, sorry. I will check this out and let you know.

jchiang87 commented 4 years ago

I've prepared two new files, both in /global/homes/j/jchiang8/scratch/desc/Run2.2i/stellar_variability:

BrunoSanchez commented 4 years ago

Hi Jim, I have found that the star standard deviations are identycal for the 4 bandpasses. This might be an error. Still, though not an urgent matter, would be nice to have those values. Thanks.

jchiang87 commented 4 years ago

They will be identical for the kplr models (see the light curve plot above). Is that also true for the applyRRLy stars?

FWIW, here is the implementation for the kplr stars where the delta_mag values for all six bands are set identically: https://github.com/lsst/sims_catUtils/blob/master/python/lsst/sims/catUtils/mixins/VariabilityMixin.py#L1332

BrunoSanchez commented 4 years ago

Ok sorry. This is correct. If kplr stands for transits then variability should be achromatic, I forgot about this.

BrunoSanchez commented 4 years ago

I know where the confusion is coming. When I join the stellar_variability stats table, with the truth_summary table, filtered by is_variable==1, using the column id, then the only models that I get crossmatched are MLT and kplr. Is there an obvious reason for this?

jchiang87 commented 4 years ago

There should be applyRRly stars that are also matched. The code that sets the is_variable flag just looks for varParamStr which does exist for the applyRRly objects. I'll look into it.

BrunoSanchez commented 4 years ago

Well, I have counted only 371 RRLyr stars. And none of them falls in the box where I am working on. So this explains it. Sorry for bothering you!

rrlyrae_radec

jchiang87 commented 3 years ago

These tables have been generated and are available via Postgres at NERSC. An example notebook showing how to access those tables is available. That notebook is also linked to the DC2 Data Product Overview page.