Open yuanqidu opened 7 months ago
Hello yuanqidu,
Thanks for your interest in design-bench
!
The benchmark keeps two copies of an MBO dataset---a private internal version that matches the format expected by the oracle model, and a public version exposed to the user for benchmarking their own algorithms.
The code for this separation is located in four functions:
dataset_to_oracle_x
: https://github.com/brandontrabucco/design-bench/blob/e52939588421b5433f6f2e9b359cf013c542bd89/design_bench/oracles/oracle_builder.py#L261
dataset_to_oracle_y
: https://github.com/brandontrabucco/design-bench/blob/e52939588421b5433f6f2e9b359cf013c542bd89/design_bench/oracles/oracle_builder.py#L315
oracle_to_dataset_x
: https://github.com/brandontrabucco/design-bench/blob/e52939588421b5433f6f2e9b359cf013c542bd89/design_bench/oracles/oracle_builder.py#L359
oracle_to_dataset_y
: https://github.com/brandontrabucco/design-bench/blob/e52939588421b5433f6f2e9b359cf013c542bd89/design_bench/oracles/oracle_builder.py#L414
This boilerplate code handles the conversion from oracle format to public format.
For a particular task
from design-bench, you can find out what format the oracle expects by checking task.oracle.expect_normalized_x
and task.oracle.expect_normalized_y
.
Design-Bench internally manages these to ensure the format is correct when task.predict(xs)
is called, where xs
is a batch of designs you want to evaluate.
This section of the README may help too:
import design_bench
task = design_bench.make('TFBind8-Exact-v0')
# convert x to logits of a categorical probability distribution
task.map_to_logits()
discrete_x = task.to_integers(task.x)
# normalize the inputs to have zero mean and unit variance
task.map_normalize_x()
original_x = task.denormalize_x(task.x)
# normalize the outputs to have zero mean and unit variance
task.map_normalize_y()
original_y = task.denormalize_y(task.y)
# remove the normalization applied to the outputs
task.map_denormalize_y()
normalized_y = task.normalize_y(task.y)
# remove the normalization applied to the inputs
task.map_denormalize_x()
normalized_x = task.normalize_x(task.x)
# convert x back to integers
task.map_to_integers()
continuous_x = task.to_logits(task.x)
Dear authors,
Thanks for open sourcing this library. I'm trying to understand if the oracles were trained on normalized or unnormalized x. Thank you for your help. If yes, what dataset they were used for normalizing them? Since when we use the oracle, we can only normalize them by the provided dataset (smaller), would this cause problems as input to the oracle (if normalized on a larger dataset)?
Best, Yuanqi