Data Query is non-deterministic

Hi,

I noticed that querying a specific architecture doesn't always result in getting the same parameters back. Below is an example cell, where the query result shows the same operations and adjacency matrix, but different values for the floating point parameters.

Below are the results from querying the same architecture 3 times, getting different results each time.

{'module_adjacency': array([[0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int8), 'module_operations': ['input', 'conv1x1-bn-relu', 'conv3x3-bn-relu', 'maxpool3x3', 'conv3x3-bn-relu', 'conv3x3-bn-relu', 'output'], 'trainable_parameters': 32426634, 'training_time': 4321.9140625, 'train_accuracy': 1.0, 'validation_accuracy': 0.9431089758872986, 'test_accuracy': 0.9406049847602844}

{'module_adjacency': array([[0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int8), 'module_operations': ['input', 'conv1x1-bn-relu', 'conv3x3-bn-relu', 'maxpool3x3', 'conv3x3-bn-relu', 'conv3x3-bn-relu', 'output'], 'trainable_parameters': 32426634, 'training_time': 4326.7412109375, 'train_accuracy': 1.0, 'validation_accuracy': 0.9487179517745972, 'test_accuracy': 0.944411039352417}

{'module_adjacency': array([[0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int8), 'module_operations': ['input', 'conv1x1-bn-relu', 'conv3x3-bn-relu', 'maxpool3x3', 'conv3x3-bn-relu', 'conv3x3-bn-relu', 'output'], 'trainable_parameters': 32426634, 'training_time': 4309.798828125, 'train_accuracy': 1.0, 'validation_accuracy': 0.9432091116905212, 'test_accuracy': 0.9445112347602844}

There is a difference in validation accuracy of 0.5%, and a difference in test accuracy of 0.4%. There is also a difference in training time of about 17 (I'm assuming this is in seconds?).

Is there any word on where these inaccuracies come from, and what level of accuracy can be expected?

A brief round of testing revealed the following numbers wrt accuracy (numbers were taken for the cell shown in the above examples) (Code was taken from some of my unit tests):

query_result: Dict = self.nb101.query(cell)

self.assertAlmostEqual(1.0, query_result["train_accuracy"], delta=0.005)
self.assertAlmostEqual(0.9431, query_result["validation_accuracy"], delta=0.006)
self.assertAlmostEqual(0.9406, query_result["test_accuracy"], delta=0.006)
self.assertAlmostEqual(4321.91, query_result["training_time"], delta=20)
self.assertEqual(32426634, query_result["trainable_parameters"])

google-research / nasbench

Data Query is non-deterministic #23