LSSTISSC / Tidalsaurus

Detecting Tidal Features to Uncover Galaxy Interactions
MIT License
4 stars 0 forks source link

Retrieve images from HSC database using unagi #4

Closed EiffL closed 2 years ago

EiffL commented 2 years ago

Once we have an SQL query that we like in #3, we can use the unagi tool to download all the images from that query in bulk.

I have a notebook that illustrates how to download the data with unagi here: https://github.com/EiffL/Quarks2CosmosDataChallenge/blob/main/notebooks/HSCDataPreparation.ipynb

EiffL commented 2 years ago

But we'll probably want to create a small dedicated script

a-desmons commented 2 years ago

@EiffL I tried to use your script and got the following error:

unagi_error

Does the code work on your end? Otherwise I might just have to write a script to use hsc_cutout instead of hsc_bulk_cutout. Also, will we need to extract the PSFs for the objects too or is it not relevant for this project?

EiffL commented 2 years ago

yeahhh so it works for me, but maybe it's because I have a big computer with a lot of cores... Try to disable the parallel retrieving of the catalog by doing nproc=1, # Download using 2 parallel jobs in the options to task.hsc_bulk_cutout

And otherwise, we don't need the mask (mask=True options in the bulk_cutout), we could make use of the PSF.... but maybe not at this first stage. It could be use to generate random perturbations of the images that are consistent with PSF variations, but that's a level of refinement we can think about later.

EiffL commented 2 years ago

humm but for some reason I couldn't download from pdr3 with unagi, maybe it hasnt been updated for that yet

EiffL commented 2 years ago

ah also, we want to retrieve all bands, so for the filters options, you can set:

filters = ['HSC-G', 'HSC-R', 'HSC-I', 'HSC-Z', 'HSC-Y']
EiffL commented 2 years ago

and we probably want stamps of size 64x64 at least

a-desmons commented 2 years ago

Okay sounds good, I'll try with those parameters! And yeah I think it hasn't updated for pdr3 yet, it seems you can get access to the internal release dr3 data which I don't have permission for. I'll send a message to the unagi creators asking if they have a timeframe for when pdr3 will be available to use :)

EiffL commented 2 years ago

Great :-) and otherwise, pdr2 should still be good :-)

a-desmons commented 2 years ago

Stamps size 288x288 (seems to be a lot of empty cutouts, stars? or galaxies too far away?) pixel288 :

EiffL commented 2 years ago

Humm yeah these seem to be overkill... Or maybe we would want to add a cut on size as well to make sure we have large enough objects.

Otherwise maybe going for 128x128 for a first try would work.

In any case, will probably want to do a first pass with some fixed settings, maybe the ones you currently have, and see what we get. Depending on the results we might want to go back and refine the sample selection.

The only drawback of extracting large stamps is that it takes some space to store, but you can always cut down the stamps live while you run the dataset during the network training.