LSSTDESC / derp

A first attempt at a simple LSST DRP catalog emulator
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

DC2 Run 1.1p experiment: making X and y from the truth and DRP Object tables #4

Closed drphilmarshall closed 6 years ago

drphilmarshall commented 6 years ago

Here's a first go at a derp.Emulator class, and a notebook that derives and demo's its make_training_set method. Not much to look at, but just following @danielsf and @yymao ' s DC2 tutorials gets us a useful-looking design matrix X and corresponding (multivariate) response variables y. Can you take a quick look at this please @jiwoncpark , and let me know if I have made the right things for the pytorch models to ingest? All comments welcome - its a bit rough round the edges, but then it is sprint week.

This closes #2 , when merged.

jiwoncpark commented 6 years ago

I've run the notebook and all looks great! As I mentioned in a comment, I think the make_training_sets may need a helper function (called something like _preprocess_for_training) that gets rid of null values, etc.

Also, since this is one "chunk" of the sky, eventually we'll need to wrap make_training_sets in another function or loop that points at a list of regions, right? We may need to parallelize this (if we want a few million examples) but this should be straightforward!

Out of curiosity, would you want to keep working with a single-tract CoAdd? I'm wondering how much variability in optical PSF there is across tracts in minion_1016 if any. Will this tract be representative of other tracts?

drphilmarshall commented 6 years ago

Thanks @jiwoncpark !

Maybe object_id shoudn't be in X. You'll need it for joining with the extragalactic table though. I'll issue a helper function we might need to preprocess the X and Y further for training, and the object_id deletion can go in there if you'd like!

That sounds good: X and y should be training-ready, I think - so I guess we just need to carry the IDs around as well, but as separate dataframes. I'm not sure we need another function, but let's see! Thanks for issuing that separately in #5.

Separately, is <band>_modelfit_CModel_fracDev the flux ratio of bulge to total?

Yes, I think so. I was just looking for some familiar properties that might be predictable by an ANN.

Also, since this is one "chunk" of the sky, eventually we'll need to wrap make_training_set in another function or loop that points at a list of regions, right? We may need to parallelize this (if we want a few million examples) but this should be straightforward!

Yes - I had not got that far... I was assuming we'd be able to refactor and it'd be easy :-)

Out of curiosity, would you want to keep working with a single-tract CoAdd? I'm wondering how much variability in optical PSF there is across tracts in minion_1016 if any. Will this tract be representative of other tracts?

Oh! Yes, I was thinking to try working with the whole catalog, not just a single tract - that's why I started %%time-ing things, in fact... I'll try this.

drphilmarshall commented 6 years ago

Thanks very much, @jiwoncpark ! Next up: training a model :-)