CosmoStat / autometacal

Metacalibration and shape measurement by automatic differentiation
MIT License
4 stars 1 forks source link

Create TF Dataset of simulated galaxies #4

Closed EiffL closed 3 years ago

EiffL commented 3 years ago

To run the various tests we will be interested in, we will need some ready available galaxy images, under a series of noise and PSF conditions. This issue is to track the development of a TensorFlow Dataset class that uses GalSim to generate a dataset of postage stamps, with each record having the following entries:

This way, we will be able to add a random shear to the galaxy image on the fly, and add whatever amount of noise we want.

Ultimately, we also probably want to be able to load sims made by Axel. @aguinot can you describe here where to find your sims and how to read them?

andrevitorelli commented 3 years ago

I'm toying with the idea of modifying the tfds class to not just unpack and assign images, but to generate them. In this way we can have just a text file as a "catalogue" (eg, with position, ellipticity, SNR) and have galsim generate images as needed. If this is too hard of course I could have a separate class do this (which is what I'm working on right now). For reference, the tfds base class structure:

class galgen(tfds.core.GeneratorBasedBuilder):
    """DatasetBuilder for galgen dataset."""

    VERSION = tfds.core.Version('1.0.0')
    RELEASE_NOTES = {
      '0.0.0': 'Initial release.',
    }

    def _info(self) -> tfds.core.DatasetInfo:
        """Dataset metadata (homepage, citation,...)."""
        return tfds.core.DatasetInfo(
            builder=self,
            features=tfds.features.FeaturesDict({
                'image': tfds.features.Image(shape=(256, 256, 3)),
                'label': tfds.features.ClassLabel(names=['no', 'yes']),
            }),
        )

    def _split_generators(self, dl_manager: tfds.download.DownloadManager):
        """Download the data and define splits."""

        extracted_path = dl_manager.download_and_extract('http://data.org/data.zip')
        # dl_manager returns pathlib-like objects with `path.read_text()`,
        # `path.iterdir()`,...
        return {
            'train': self._generate_examples(path=extracted_path / 'train_images'),
            'test': self._generate_examples(path=extracted_path / 'test_images'),
        }

    def _generate_examples(self, path) -> Iterator[Tuple[Key, Example]]:
        """Generator of examples for each split."""
        for img_path in path.glob('*.jpeg'):
            # Yields (key, example)
            yield img_path.name, {
              'image': img_path,
              'label': 'yes' if img_path.name.startswith('yes_') else 'no',
            }
EiffL commented 3 years ago

Sweet! Here is a (very) rough code example of what we could do: https://colab.research.google.com/drive/1_I2SRHhdxX-xpz-q3Iv-lSHw6LtP2g04?usp=sharing

https://www.tensorflow.org/datasets/add_dataset#dataset_configurationvariants_tfdscorebuilderconfig

my old code for generating galaxies with GalSim: https://github.com/ml4astro/galaxy2galaxy/blob/6d8b20722a5545c8c79a19cb67c6131c061763ed/galaxy2galaxy/data_generators/cosmos.py#L67

EiffL commented 3 years ago

https://github.com/GalSim-developers/GalSim/blob/releases/2.2/examples/demo11.py

andrevitorelli commented 3 years ago

So, I branched into "dataset", and put some code there. It's very crude, actually. But I wanted to have something there from where I could build up.

andrevitorelli commented 3 years ago

The current data generator is on track to be improved on Monday.

EiffL commented 3 years ago

How is this going :-) is the code in the dataset branch up to date? Happy to take a look at things if useful

andrevitorelli commented 3 years ago

So, I got a bit carried away with that, and ended up creating a repo for different tfds' that I created. They are here: https://github.com/andrevitorelli/TenGU

My suggestion is that we either use the inverse_cat or gal_gen as toy models.

EiffL commented 3 years ago

Awesome!!!

EiffL commented 3 years ago

This first issue on galaxy generation has been partially solved by #10 But that PR doesn't address all the points we had here yet, in particular we don't have PSFs yet etc. So, we will leave this issue open, and expect to add new features to the galaxy images dataset as they become needed for the implementation of the next steps, like #2