Open Nathan-zh opened 2 years ago
Hi Nathan-zh,
Thanks for bringing this issue to my attention about the AntMorphology task! From the snippet you provided, the issue may be caused by floating point errors that occur when computing the mean and standard deviation for normalizing the designs. The relevant code is at this location: https://github.com/brandontrabucco/design-bench/blob/27ed0c3d1c28acb43084fac1b317fc7268cd7c9a/design_bench/datasets/dataset_builder.py#L872.
In essence, the mean and standard deviation statistics are calculated in a fashion that does not require the entire dataset to be in memory at once (a design choice I made to accommodate the inclusion of larger MBO tasks in the future that require loading the dataset directly from the disk per (x, y) pair). But, this could be exacerbating floating point errors in the calculation of the normalization statistics, as a possible explanation for the difference you are seeing above.
Switching to float64 for the AntMorphology task might be necessary if the problem is due to floating point error.
It does help if I switch to float64. I think this numerical problem is caused by the distribution of features, i.e. most values are around 0 but a few values could be as large as 200-300.
Hi Brandon,
Here is another issue about the Hopper Controller task. I use the exact oracle model, which means predictions should be identical to labels. Maybe it could be a little different as the oracle is a simulator.
task = design_bench.make('HopperController-Exact-v0', relabel=False)
pred = task.predict(task.x[:10])
print(np.squeeze(pred))
--> [59.14225 72.68841 57.52715 59.30107 68.945305 95.25469 54.364407
58.248234 57.225212 55.378292]
print(np.squeeze(task.y[:10]))
--> [108.34371 128.48705 103.78237 92.259224 147.93976 124.293274
117.06348 148.98955 133.39757 101.68808 ]
But the output of this snippet is not what I expected. Would you please test this code? Thanks!
Nathan
Thanks for pointing this out, I currently think the issue is due to the original dataset being collected with a stochastic policy, but in order to speed up evaluation, I implemented the oracle for this task as deterministic in the benchmark, so that we don't need to average the performance of more than one rollout.
There is a pull request about this I have yet to merge, I'll let you know once I do: https://github.com/brandontrabucco/design-bench/pull/3
Hi Brandon,
I recently install and use the dataset from your package. Thanks for your work to build this benchmark.
I have a problem with the function _task.normalizex.
The output shows that normalization and denormalization don't change the features but predictions are quite different. Is there anything wrong with my codes? I feel it's a trivial issue. But I cannot figure out where the problem is.
Nathan