danielewworrall / harmonicConvolutions

Deep Translation and Rotation Equivariance
MIT License
266 stars 48 forks source link

How do you load a stored model? #4

Closed drewm1980 closed 7 years ago

drewm1980 commented 7 years ago

Very interesting paper! Thanks for sharing it!

I was able to run: python train.py 0 mnist deep_stable data to completion. It ran for 200 epochs in 30 min. on a Pascal Titan X.

It reported two slightly conflicting testing accuracies:

Model saved in file: ./checkpoints/deep_mnist/trialA Testing test Acc.: 0.978261 Test accuracy: 0.971415

How do I actually load a trained model and run it on an image? I've used a couple other deep learning API's, but I'm new to Tensorflow... it seems like its deserializer relies on your variables already being defined in the python namespace before you can actually load anything... but I think you're defining everything inside functions, so it's not clear how to load the model back in.

My main motivation was actually to get some idea how fast your network runs compared to a non-harmonic network of comparable accuracy. You count and compare the number of multiplications in your paper, but things like data access patterns and branch divergence matter a LOT on the gpu.

Thanks!

StephanGarbin commented 7 years ago

Hi Andrew, Thanks for your question. The guide to loading and saving can be found here. Tesorflow does rely on the names of those variables you want to load to match the ones you saved. That is not a problem however, as you usually replicate the same network structure upon loading. This is also not inefficient because you do not have to initalise explicitly when restoring. From the link above, saving works like this (and we do it in the same way):

# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  # Do some work with the model.
  ..
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print("Model saved in file: %s" % save_path)

And loading like this:

# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

So what you can do is this:

Does that help?

Best regards Stephan

drewm1980 commented 7 years ago

Hi Stephan, thanks for the reply!

I had already read TensorFlow's documentation, and was hoping you already had some code that can actually restore a model and run it; I wanted to add code that actually does timing to your existing scripts, but since your scripts re-train the network from scratch every time you run them, an edit-compile-debug cycle takes an hour (30 min. of training, maybe 30 min of tinkering).

Have you done a speed comparison (in seconds, not manually counted multiplies) of your network vs. a typical CNN? The paper makes it sounds like it is mainly just 4X more multiplies to go from real to complex multiplication, but I'm having a bit of trouble confirming that the other stuff, like the formation of the discretized harmonic basis functions, gets compiled down to constant matrices, rather than incurring overhead for shuffling around data at runtime.

Maybe the script that runs one of your networks on rotated versions of a larger image (i.e. the animations on your project page) would be a good one to instrument?

StephanGarbin commented 7 years ago

Hi Andrew,

Thanks for your response! Since saving/restoring models is easy in tensorflow as outlined above, and we expect people to use their own training/testing code most of the time, adding a 'restore' flag is unfortunately low on our list of priorities.

We have done profiling of key parts of the code and we are soon releasing updates that will increase run-time performance significantly - stay tuned for some exciting changes. :)

Regarding your suggestion:

The paper makes it sounds like it is mainly just 4X more multiplies to go from real to complex multiplication, but I'm having a bit of trouble confirming that the other stuff, like the formation of the discretized harmonic basis functions, gets compiled down to constant matrices, rather than incurring overhead for shuffling around data at runtime.

Our algorithm remains static during training and therefore benefits from any optimisations of the computation graph that the current version of tensorflow allows. Most of the 'other stuff' you are referring to happens only at initialisation and is fast in any case (You can double-check this by going through the theoretical arithmetic and memory complexity of the operations). Like with most CNNs, it is therefore safe to conclude that the runtime is clearly dominated by performing the convolutions. If you believe that this is incorrect, we are keen to examine the results of your tests: For publishing speed tests and conducting further experiments, we welcome any community input (especially given the multitude of machine and software configurations out there) - these important contributions are why we made the code open-source.

StephanGarbin commented 7 years ago

I'm closing this as model loading has been addressed. We will put a note in the docs if and when we decide to make this easier. Thanks for the suggestion.