Smith42 / astroddpm

A denoising diffusion probabilistic model synthesises galaxies that are qualitatively and physically indistinguishable from the real thing.
https://arxiv.org/abs/2111.01713
GNU Affero General Public License v3.0
48 stars 8 forks source link

Could you share the dataset? #1

Closed askerlee closed 2 years ago

askerlee commented 2 years ago

I tried to use your download script to download the images, but found that many images are empty (0 bytes), and by manually visiting the URL I got "500 server error" or "301 Moved permanently". Eventually only around 2k6k images were downloaded successfully. (They were converted to around 2k npy files.)

So, I wonder could you please share the images you downloaded? I guess if every user downloads their own copy of the images, it also adds load to the server.

Thank you so much!

Smith42 commented 2 years ago

Unfortunately the full SDSS dataset is huge, so I can't host the images myself (although I will look into a torrent-based solution...). The script worked for me last time I used it a couple months ago, so it's odd that it has stopped working. Are you trying to download the SDSS images or the PROBES images?

askerlee commented 2 years ago

I see. I'm downloading the probes images. So the SDSS images are even larger? :sweat_smile: Would the SDSS be the main dataset for the model training? Thanks.

Smith42 commented 2 years ago

Yes the full SDSS dataset is around a TB of data total with 306,000 galaxies. I use that data to perform the statistical analysis that is in the paper, and use the PROBES dataset to produce the pretty galaxies you see in the figures (as those galaxies are large and particularity well resolved with no obscuring foreground objects).

BTW I just checked and you should expect ~2000 PROBES galaxies total, so looks like the script is still working!

askerlee commented 2 years ago

I see. This is very helpful info. In preprocess.py, I saw there are three channels G, R, Z that are combined into one image. Could you teach me how to understand these channels, i.e., how to convert them into RGB? Thanks.

Smith42 commented 2 years ago

Here we scrape from the DESI Legacy Survey DR9, check out the write up here: https://www.legacysurvey.org/dr9/description/.

They use the photometric system (https://en.wikipedia.org/wiki/Photometric_system), and you can think of g as "green", r as "red", and z as "near infrared". So you can map g -> blue, r -> green, and z -> red to make RGB imagery.

askerlee commented 2 years ago

I see. Thank you! 😄