capstone2019-neuralsearch / AC297r_2019_NAS

Harvard IACS Data Science Capstone: Neural Architecture Search (NAS) with Google
5 stars 1 forks source link

Identify good scientific datasets for DARTS #19

Open dylanrandle opened 4 years ago

dylanrandle commented 4 years ago

Figure out which scientific dataset makes sense for running DARTS

JiaweiZhuang commented 4 years ago

The key question is which scientific datasets are good to apply DARTS on? Basic requirements:

  1. The dataset needs to be challenging / complicated / high-dimensional enough, so that it makes sense to run neural architecture search on it. The previously suggested Graphene kirigami dataset (#3) is too simple and even just a tiny model can achieve top score.
  2. The ML problem needs to be a relatively standard supervised classification/regression problem, so that it is possible to adapt the DARTS code (https://github.com/quark0/darts) without reimplementing the algorithm from scratch. The suggested structural optimization problem (https://github.com/google-research/neural-structural-optimization) is a very interesting problem on its own, but it is very different from a standard supervised tasks and requires the entire numerical solver to be written in a differential programming language (TF, PyTorch, JAX). It is indeed possible to apply the DARTS idea/algorithm on any NN-related problems including this one, but here we focus more on existing DARTS code/implementation to make the semester-long project manageable.

Right now the Galaxy Zoo data (#18) seems a good candidate, as it is a great physical/astronomical problem and there was a Kaggle competition that can be used as performance baseline.