Closed EiffL closed 1 year ago
It's currently working, but there is a tiny problem in the ordering of the training/testing dataset. By default images are sorted from brightest to faintest, and I was selecting the last objects for testing. Which immediately translates to a big distribution shift between samples.
This can be fixed by randomizing the order of each file and keeping some fraction of all of them.
It works by doing the following:
from datasets import load_dataset
dset = load_dataset('astroclip/datasets/legacy_survey.py')
example = dset['train'][6]
ok, this works!
This PR adds a Huggingface dataset with matching images and spectra.