brandontrabucco / design-bench

Benchmarks for Model-Based Optimization
MIT License
80 stars 19 forks source link

Fix dataset loading, and other minor fixes #21

Open christopher-beckham opened 4 months ago

christopher-beckham commented 4 months ago

Hi,

DiskResource has been completely changed to now support downloading from a HuggingFace datasets repository. (Just to keep things simple I completely removed the Google Cloud logic, but if you think it should stay then we can maybe just merge the two together.)

As it stands, it's been hardcoded to download from this repo but it can be changed to something else by overriding DB_HF_DATA (see disk_resource.py). It would be good if you can test this branch out with a sanitised design_bench_data folder to make sure that everything downloads correctly.

Sadly, most datasets are missing their pretrained oracle weights :(. This means that most tasks just take forever to import since it will try train an oracle instead. These are the only pretrained weights I have on hand:

./ant_morphology/ant_morphology/gaussian_process.zip
./ant_morphology/ant_morphology/random_forest.zip
./dkitty_morphology/dkitty_morphology/gaussian_process.zip
./dkitty_morphology/dkitty_morphology/random_forest.zip
./hopper_controller/hopper_controller/random_forest.zip
./hopper_controller/hopper_controller/gaussian_process.zip
./superconductor/superconductor/random_forest.zip
./superconductor/superconductor/gaussian_process.zip
./tf_bind_8-SIX6_REF_R1/tf_bind_8/gaussian_process.zip
./tf_bind_8-SIX6_REF_R1/tf_bind_8/random_forest.zip

If you are able to fill in some of these gaps that would be good.

Other changes:

Thanks.