Closed EiffL closed 2 years ago
But we'll probably want to create a small dedicated script
@EiffL I tried to use your script and got the following error:
Does the code work on your end? Otherwise I might just have to write a script to use hsc_cutout instead of hsc_bulk_cutout. Also, will we need to extract the PSFs for the objects too or is it not relevant for this project?
yeahhh so it works for me, but maybe it's because I have a big computer with a lot of cores... Try to disable the parallel retrieving of the catalog by doing nproc=1, # Download using 2 parallel jobs
in the options to task.hsc_bulk_cutout
And otherwise, we don't need the mask (mask=True
options in the bulk_cutout), we could make use of the PSF.... but maybe not at this first stage. It could be use to generate random perturbations of the images that are consistent with PSF variations, but that's a level of refinement we can think about later.
humm but for some reason I couldn't download from pdr3 with unagi, maybe it hasnt been updated for that yet
ah also, we want to retrieve all bands, so for the filters options, you can set:
filters = ['HSC-G', 'HSC-R', 'HSC-I', 'HSC-Z', 'HSC-Y']
and we probably want stamps of size 64x64 at least
Okay sounds good, I'll try with those parameters! And yeah I think it hasn't updated for pdr3 yet, it seems you can get access to the internal release dr3 data which I don't have permission for. I'll send a message to the unagi creators asking if they have a timeframe for when pdr3 will be available to use :)
Great :-) and otherwise, pdr2 should still be good :-)
Stamps size 288x288 (seems to be a lot of empty cutouts, stars? or galaxies too far away?) :
Humm yeah these seem to be overkill... Or maybe we would want to add a cut on size as well to make sure we have large enough objects.
Otherwise maybe going for 128x128 for a first try would work.
In any case, will probably want to do a first pass with some fixed settings, maybe the ones you currently have, and see what we get. Depending on the results we might want to go back and refine the sample selection.
The only drawback of extracting large stamps is that it takes some space to store, but you can always cut down the stamps live while you run the dataset during the network training.
Once we have an SQL query that we like in #3, we can use the unagi tool to download all the images from that query in bulk.
I have a notebook that illustrates how to download the data with unagi here: https://github.com/EiffL/Quarks2CosmosDataChallenge/blob/main/notebooks/HSCDataPreparation.ipynb