cleverhans-lab / cleverhans

An adversarial example library for constructing attacks, building defenses, and benchmarking both
MIT License
6.15k stars 1.39k forks source link

Predicted Class Labels #1227

Open cvdolph opened 2 years ago

cvdolph commented 2 years ago

Feature to report/record predicted class labels when using do_eval or model_eval. The overall accuracy is reported, however, characterizing model performance on specific test data would enable insight for a problem space.

cvdolph commented 2 years ago

Just would like to ping on this. Is there a command that would provide predicted class labels?

def model_eval(sess, x, y, predictions, X_test=None, Y_test=None, feed=None, args=None): """ Compute the accuracy of a TF model on some data :param sess: TF session to use :param x: input placeholder :param y: output placeholder (for labels) :param predictions: model output predictions :param X_test: numpy array with training inputs :param Y_test: numpy array with training outputs :param feed: An optional dictionary that is appended to the feeding dictionary before the session runs. Can be used to feed the learning phase of a Keras model for instance. :param args: dict or argparse Namespace object. Should contain batch_size :return: a float with the accuracy value """ global _model_eval_cache args = _ArgsWrapper(args or {})

assert args.batch_size, "Batch size was not given in args dict" if X_test is None or Y_test is None: raise ValueError("X_test argument and Y_test argument " "must be supplied.")

Define accuracy symbolically

key = (y, predictions) if key in _model_eval_cache: correct_preds = _model_eval_cache[key] else: correct_preds = tf.equal(tf.argmax(y, axis=-1), tf.argmax(predictions, axis=-1)) _model_eval_cache[key] = correct_preds

Init result var

accuracy = 0.0

with sess.as_default():

Compute number of batches

nb_batches = int(math.ceil(float(len(X_test)) / args.batch_size))
assert nb_batches * args.batch_size >= len(X_test)

X_cur = np.zeros((args.batch_size,) + X_test.shape[1:],
                 dtype=X_test.dtype)
Y_cur = np.zeros((args.batch_size,) + Y_test.shape[1:],
                 dtype=Y_test.dtype)
for batch in range(nb_batches):
  if batch % 100 == 0 and batch > 0:
    _logger.debug("Batch " + str(batch))

  # Must not use the `batch_indices` function here, because it
  # repeats some examples.
  # It's acceptable to repeat during training, but not eval.
  start = batch * args.batch_size
  end = min(len(X_test), start + args.batch_size)

  # The last batch may be smaller than all others. This should not
  # affect the accuarcy disproportionately.
  cur_batch_size = end - start
  X_cur[:cur_batch_size] = X_test[start:end]
  Y_cur[:cur_batch_size] = Y_test[start:end]
  feed_dict = {x: X_cur, y: Y_cur}
  if feed is not None:
    feed_dict.update(feed)
  cur_corr_preds = correct_preds.eval(feed_dict=feed_dict)

  accuracy += cur_corr_preds[:cur_batch_size].sum()

assert end >= len(X_test)

# Divide by number of examples to get final value
accuracy /= len(X_test)

return accuracy

_model_eval_cache = {}