AstroPile / FlatironMeeting2024

AstroPile meet-up at the Flatiron Institute
https://astropile.github.io/FlatironMeeting2024/
MIT License
2 stars 3 forks source link

[Data] JWST / HST deep surveys #6

Open mhuertascompany opened 5 months ago

mhuertascompany commented 5 months ago

JWST / HST deep surveys

Include a new dataset in astropile from public deep HST and JWST galaxy surveys

Contacts: Participants: Marc Huertas-Company + anyone interested

Goals and deliverable

Building an astropile dataset with JWST and HST (imaging) stamps. Investigate if spectra for some sources can be added. The main source of data is MAST. However I expect that catalog information is in different formats for different datasets. Homogenization might be an issue.

Resources needed

mostly enthusiasm + familiarity with astropile format + imaging / MAST archives.

Detailed description

[add additional details about the project]

mhuertascompany commented 5 months ago

started to download COSMOS-Web data from DJA. created branch jwst. working for now.

mhuertascompany commented 5 months ago

Worked on downloading CEERS data from DJA. COSMOS-Web data is still not complete. CEERS contains ~ 70.000 galaxies. we have now a running script that downloads large field images, cuts stamps and stores them in a hdf5 file, following the same structure as for the hsc dataset. all updates pushed to jwst branch. for now it's in notebook form..

ToDo: convert to astropile format - looking at this function: def save_in_standard_format(catalog_filename, cutouts_filename, output_dir, num_processes=None):

mhuertascompany commented 5 months ago

JWST HF dataset working Tested on Liam's data loader.

Screenshot 2024-03-27 at 17 36 24 Screenshot 2024-03-27 at 17 37 07 Screenshot 2024-03-27 at 17 37 46
mhuertascompany commented 5 months ago

Run a photoz baseline network using 6 band imaging + Liam's wrapper. Disaster.

Screenshot 2024-03-28 at 14 29 21 Screenshot 2024-03-28 at 14 29 28 Screenshot 2024-03-28 at 14 29 42
kartheikiyer commented 5 months ago

this is similar to what i got yesterday with the IOB+ILI, so it might be more of a dataset issue than a pipeline/dataloader issue. its good to know that its reproducible. .)

mhuertascompany commented 5 months ago
Screenshot 2024-03-28 at 17 21 29

trying a flow on the summary statistics of the resnet...still training.

kartheikiyer commented 5 months ago

Finally managed to get the data preprocessing + loading classes working across multiple fields. Made a basic comparison of wide / deep / cluster fields to see how things look.

image Plot 1: comparison of different fields imported from DjA in the same format. needed to manually add a couple of photoz files that weren't in the correct format. But compatible with the hdf5 loader @mhuertascompany wrote and the training dataset wrapper from @lhparker1 .

Couple more plots incoming (and code to be added to the repo soon).

kartheikiyer commented 5 months ago

trained a basic AE with the IOB layer (paper) using only single band (F200w) images for now, but the code can be scaled to arbitrary filtersets.

image

kartheikiyer commented 5 months ago

The latent representation learned can then be compressed and correlated with properties (and we can compare which fields lie in different parts of the latent space). Plot here using PaCMap.

image

image Plot: redshift correlates with location in latent space, which makes me think we ought to be seeing a better redshift prediction, or something else is going on. (e.g. the IOB representation is learning SNR/compactness).

image

image Plot: latent space coded by which field contributes the maximum galaxies in any cell. probably wrong/needs to be renormed for sample size.