google-research / mixmatch

Apache License 2.0
1.13k stars 163 forks source link

eval_stats causes mixmatch to freeze #14

Closed DewaldHoman closed 5 years ago

DewaldHoman commented 5 years ago

When using mixmatch.py to train a dataset (e.g. cifar10 README example), the summary generated by eval_stats seems to freeze the program after a few epochs. The problem seems to be introduced at the start of evaluating the valid subset, during the loop:

for subset in ('train_labeled', 'valid', 'test'):      images, labels = self.tmp.cache[subset]      predicted = []      for x in range(0, images.shape[0], batch):           p = self.session.run(               classify_op,               feed_dict={               self.ops.x: images[x:x + batch],               **(feed_extra or {})           })         predicted.append(p)      predicted = np.concatenate(predicted, axis=0)      accuracies.append((predicted.argmax(1) == labels).mean() * 100)

However, it does continue training at the correct epoch when interrupting and running the command again. (I am using Tensorflow 1.14 on a Titan X gpu.)

david-berthelot commented 5 years ago

When you say freezing, does it stay frozen? Because otherwise it's a normal behavior to pause while evaluation is being run.

DewaldHoman commented 5 years ago

Yes, it stays frozen indefinitely. But it seems to only hang after a few epochs on the first valid batch evaluation that is passed to classify_op during the tf session.

david-berthelot commented 5 years ago

I'm not sure how to go about this given I cannot reproduce it. Did you find anything new?

david-berthelot commented 5 years ago

May be this is a problem specific to your machine. Closing since I cannot reproduce.