Open gatoatigrado opened 5 years ago
bcolz_source = plasticc_data.PlasticcBcolzSource.get_default()
meta_table = bcolz_source.get_table('training_set_meta')
def gen_ids(meta_table):
meta_map = meta_table.where('object_id > 0', outcols=['object_id'])
for row in meta_map:
yield row.object_id
another option:
all_ids = meta_table['object_id'][:]
The TF learned alignment model will output a vector for each point in a lightcurve,
We'd like to train a model to generate these vectors, such that the correct alignment produces high dot products between correctly aligned samples.
A synthetic dataset for training this model should generate negatives and positives.
For this, we should use
tf.data.Dataset
, a modern TensorFlow API for representing a stream of input data. I think this link has a decent introduction to the API with examples: http://adventuresinmachinelearning.com/tensorflow-dataset-tutorial/. At the end we'd like aninput_fn()
which returns atf.data.Dataset
instance, probably withoutput_shapes
and
output_types
where 'label' is 0 for negatives and 1 for positives.
https://github.com/aimalz/justice/pull/76 has an example of generating a dataset for all points in a light curve. Here, we want to sample instead of feeding all points at once, and also prefix tensors with left/right. For the negatives, we may be able to do some of this with dataset combinators,