Closed danielsf closed 6 years ago
@jbkalmbach I would like to be able to pass this code on to other so that they can start thinking about how to generate a truth catalog for cosmoDC2. Do you mind giving this PR a quick review? Thanks.
If you want to come by and chat about this, feel free. It was a very stream of consciousness design leading up to the CMU meeting. It's possible that you are right and I should just implement your comments.
;)
Thanks for the quick review of the new commits.
You might see a few more come through in the next few hours. I need to test that the sprinkled
flag works correctly. I also want to include the unsprinkled SNe before I hand this off to Eve and Yao. Don't feel the need to review things as they come through, but if you notice something odd, let me know.
@danielsf I encountered an issue when trying to make the light curve reader work (cf https://github.com/LSSTDESC/gcr-catalogs/issues/203). I was assuming that all objects that are present in variables_and_transients
would also be present in light_curves
, but apparently that is not the case.
Here's an example with /global/cscratch1/sd/danielsf/Run1.2_truth/run_1.2_trial_light_curves_181004.db
:
sqlite> select uniqueId from variables_and_transients limit 1;
9262
sqlite> select obshistid from light_curves where uniqueId=9262;
sqlite>
Is this expected or is this due to a bug in the truth catalog generation?
@yymao It's not unexpected (which doesn't mean it isn't also a bug)
When I populate the variables_and_transients
table, I just write out all of the objects in our simulated population. This neglects two details
1) To make it into the light_curves
table, an object actually has to appear on a detector. It is possible that there are supernovae that explode, rise, and fade before our simulated survey ever looks at them (or while it is looking away).
2) The population of simulated SNe we used for protoDC2 covers an area of sky larger than the protoDC2 area, so, even if we caught every supernova in protoDC2, there are going to be supernovae that do not appear in light_curves
because they are outside of our survey area.
If this population of unrealized light curves is a problem for you, I can re-create the variables catalog without them (I was unaware of (2) when I wrote the code to create the truth catalog; it should be straightforward to omit the objects that run afoul of (1)).
Let me know.
I can work around this in the reader now that I know it is expected. However, this means that if an user asks for the light curves of all variable objects, say within an RA range, the user may get some empty arrays for some objects. It's not very initiative but it's ok.
How feasible is it to add a column in variables_and_transients
table that contains the number of rows in the light_curves
table for that object (i.e., the number of detections of that object), so that the user can select on this column?
Adding an n_detections
column should be pretty easy. Do you want that for Run 1.2 or should we add it to the list for Run 2.0?
Doesn't matter to me actually, in terms of getting the reader to work. The reader code would be essentially the same. It's just that without this column the users won't be able to select only objects that have light curves.
So I guess the question is whether people need this for their use cases.
If no one objects, I am going to merge this around noon PDT on October 11, just so that we have a starting point from which to start improving the truth catalog infrastructure.
This PR contains the code I used to generate the protoDC2 truth catalogs. A lot of it (especially the time domain part) is not ready for public consumption (we need to have an intentional discussion about how we are going to provide truth information for time-varying sources). I would like to merge this, however, so that I have a baseline to work against when I start developing for CosmoDC2 truth catalogs.