To test the models on the same physical scenarios that are presented to humans, we need to re-generate exactly those 150 x 8 HDF5s, except where the target objects are painted in red/yellow.
[x] Dan Bear (+ Eli) will start on this (write flag to switch to model_testing mode, which applies red/yellow paint)
[x] Dan Bear (+ Felix) will generalize the red/yellow painting to all eight domains (filenames identical to that used for human testing, except with some tag appended to the filename)
[x] Dan Bear (+ Felix) will upload the model test dataset to S3.
[x] Felix (+ Dan Bear) can validate that there is a 1-1 correspondence between the model test dataset and the human test dataset.
To test the models on the same physical scenarios that are presented to humans, we need to re-generate exactly those 150 x 8 HDF5s, except where the target objects are painted in red/yellow.
Example to human input data for dominoes (but not red/yellow painted): https://github.com/cogtoolslab/human-physics-benchmarking/blob/master/experiments/dominoes_pilot/human-physics-benchmarking-dominoes-pilot_production_1_experimental_stims.json Subdirectory containing code to run all human experiments, including human-facing metadata: https://github.com/cogtoolslab/human-physics-benchmarking/tree/master/experiments