Open dylanrandle opened 5 years ago
Link to data: https://data.galaxyzoo.org
I uploaded a cleaned-up & compressed version of the Galaxy data at:
The original *.jpg images took 13 minutes to read. The compressed NetCDF/HDF5 format (via Xarray) only takes 4 seconds :)
The compressed data are generated by those notebooks:
Then I tried a tiny CNN on the data, and got a training RMSE of ~0.09, validation RMSE of ~0.1, and a test RMSE of ~0.105 (obtained by uploading a CSV file to the original Kaggle challenge). Notebooks:
For reference, the top score on leaderboard is RMSE ~0.075. I think some data augmentation is needed to obtain such high score...
@memanuel Could you briefly summarize the results with DARTS? How do we get the test accuracy? I guess the only way is to upload CSV to Kaggle?
i believe the only way to find a test accuracy is to run the model on the test images and upload it to kaggle. i have not yet done this.
Tried ResNet-18 and "ResNet-10" (defined in https://github.com/capstone2019-neuralsearch/AC297r_2019_NAS/issues/3#issuecomment-541216854); both can get a training RMSE of 0.06 by training for 15 epochs, and 0.04 for 30 epochs; but the validation RMSE stops at 0.1. Severe overfitting.
Notebook: https://www.kaggle.com/zhuangjw/galaxy-resnet-pytorch?scriptVersionId=22948695
DARTS might do a better job as it is sort of optimizing for validation loss. Not sure how far can it go without data augmentation (rotation, zoom-in, etc.) as used by winning solutions.
Try to get state of art on Galaxy Zoo