brandontrabucco / design-bench

Benchmarks for Model-Based Optimization
MIT License
80 stars 19 forks source link

normalize and denommalize issue #8

Open Nathan-zh opened 2 years ago

Nathan-zh commented 2 years ago

Hi Brandon,

I recently install and use the dataset from your package. Thanks for your work to build this benchmark.

I have a problem with the function _task.normalizex.

# load task
task = design_bench.make('AntMorphology-Exact-v0', relabel=False)
# get the top 128 features
aa = task.x[np.argsort(np.squeeze(task.y))[-128:]]
# predict the labels
r1 = np.squeeze(task.predict(aa))

# normalize and denormalize features
bb = task.normalize_x(aa)
cc = task.denormalize_x(bb)
# predict the labels again
r2 = np.squeeze(task.predict(cc))

print(np.max(r1), np.max(r2)) #--> 198.7532   406.76566
print(np.where((aa-cc)>0.001))  #--> (array([], dtype=int64), array([], dtype=int64))

The output shows that normalization and denormalization don't change the features but predictions are quite different. Is there anything wrong with my codes? I feel it's a trivial issue. But I cannot figure out where the problem is.

Nathan

brandontrabucco commented 2 years ago

Hi Nathan-zh,

Thanks for bringing this issue to my attention about the AntMorphology task! From the snippet you provided, the issue may be caused by floating point errors that occur when computing the mean and standard deviation for normalizing the designs. The relevant code is at this location: https://github.com/brandontrabucco/design-bench/blob/27ed0c3d1c28acb43084fac1b317fc7268cd7c9a/design_bench/datasets/dataset_builder.py#L872.

In essence, the mean and standard deviation statistics are calculated in a fashion that does not require the entire dataset to be in memory at once (a design choice I made to accommodate the inclusion of larger MBO tasks in the future that require loading the dataset directly from the disk per (x, y) pair). But, this could be exacerbating floating point errors in the calculation of the normalization statistics, as a possible explanation for the difference you are seeing above.

Switching to float64 for the AntMorphology task might be necessary if the problem is due to floating point error.

Nathan-zh commented 2 years ago

It does help if I switch to float64. I think this numerical problem is caused by the distribution of features, i.e. most values are around 0 but a few values could be as large as 200-300.

Nathan-zh commented 2 years ago

Hi Brandon,

Here is another issue about the Hopper Controller task. I use the exact oracle model, which means predictions should be identical to labels. Maybe it could be a little different as the oracle is a simulator.

task = design_bench.make('HopperController-Exact-v0', relabel=False)
pred = task.predict(task.x[:10])
print(np.squeeze(pred))
--> [59.14225  72.68841  57.52715  59.30107  68.945305 95.25469  54.364407
 58.248234 57.225212 55.378292]

print(np.squeeze(task.y[:10]))
--> [108.34371  128.48705  103.78237   92.259224 147.93976  124.293274
 117.06348  148.98955  133.39757  101.68808 ]

But the output of this snippet is not what I expected. Would you please test this code? Thanks!

Nathan

brandontrabucco commented 2 years ago

Thanks for pointing this out, I currently think the issue is due to the original dataset being collected with a stochastic policy, but in order to speed up evaluation, I implemented the oracle for this task as deterministic in the benchmark, so that we don't need to average the performance of more than one rollout.

There is a pull request about this I have yet to merge, I'll let you know once I do: https://github.com/brandontrabucco/design-bench/pull/3