google-research / nasbench

NASBench: A Neural Architecture Search Dataset and Benchmark
Apache License 2.0
682 stars 129 forks source link

Data Query is non-deterministic #23

Closed ThomasCassimon closed 3 years ago

ThomasCassimon commented 4 years ago

Hi,

I noticed that querying a specific architecture doesn't always result in getting the same parameters back. Below is an example cell, where the query result shows the same operations and adjacency matrix, but different values for the floating point parameters.

Below are the results from querying the same architecture 3 times, getting different results each time.

{'module_adjacency': array([[0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int8), 'module_operations': ['input', 'conv1x1-bn-relu', 'conv3x3-bn-relu', 'maxpool3x3', 'conv3x3-bn-relu', 'conv3x3-bn-relu', 'output'], 'trainable_parameters': 32426634, 'training_time': 4321.9140625, 'train_accuracy': 1.0, 'validation_accuracy': 0.9431089758872986, 'test_accuracy': 0.9406049847602844}
{'module_adjacency': array([[0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int8), 'module_operations': ['input', 'conv1x1-bn-relu', 'conv3x3-bn-relu', 'maxpool3x3', 'conv3x3-bn-relu', 'conv3x3-bn-relu', 'output'], 'trainable_parameters': 32426634, 'training_time': 4326.7412109375, 'train_accuracy': 1.0, 'validation_accuracy': 0.9487179517745972, 'test_accuracy': 0.944411039352417}
{'module_adjacency': array([[0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int8), 'module_operations': ['input', 'conv1x1-bn-relu', 'conv3x3-bn-relu', 'maxpool3x3', 'conv3x3-bn-relu', 'conv3x3-bn-relu', 'output'], 'trainable_parameters': 32426634, 'training_time': 4309.798828125, 'train_accuracy': 1.0, 'validation_accuracy': 0.9432091116905212, 'test_accuracy': 0.9445112347602844}

There is a difference in validation accuracy of 0.5%, and a difference in test accuracy of 0.4%. There is also a difference in training time of about 17 (I'm assuming this is in seconds?).

Is there any word on where these inaccuracies come from, and what level of accuracy can be expected?

A brief round of testing revealed the following numbers wrt accuracy (numbers were taken for the cell shown in the above examples) (Code was taken from some of my unit tests):

query_result: Dict = self.nb101.query(cell)

self.assertAlmostEqual(1.0, query_result["train_accuracy"], delta=0.005)
self.assertAlmostEqual(0.9431, query_result["validation_accuracy"], delta=0.006)
self.assertAlmostEqual(0.9406, query_result["test_accuracy"], delta=0.006)
self.assertAlmostEqual(4321.91, query_result["training_time"], delta=20)
self.assertEqual(32426634, query_result["trainable_parameters"])
ThomasCassimon commented 3 years ago

For anyone stumbling across this, apparently this is a feature, and you're supposed to pass a seed in the constructor of the API object: https://github.com/google-research/nasbench/blob/master/nasbench/api.py#L115