No more globals. No more `tf.Variable`

odysseus0 commented 5 years ago

The notebook on low level TF API needs major revision as TF 2.0 will at the very least discourages the use of globals through tf.Variable and very likely have already entirely ditched it in RFC: Variables in TensorFlow 2.0.

Note that the official style guide, Effective TensorFlow 2.0, also explicitly stated: "No more globals!"

ageron commented 5 years ago

Hi @odysseus0 ,

Thanks for your feedback. What they meant by "no more globals" is not that you can't use global variables in your own code (even though, I agree that it's generally bad practice, more on this below). They were talking about the fact that TensorFlow itself will stop using global scopes everywhere. For example, tf.Variable will not go away, it's just that it will not work the same way, it will not rely on global scopes anymore:

In TF 1.x, when you write v = tf.Variable(0.0, name="my_var"), this variable gets added to the default graph. It's really adding it to a global scope. So even if the python variable v goes out of scope, the variable still exists in the graph. To access it, you can search for it by name, for example: tf.get_default_graph().get_tensor_by_name("my_var:0"). Searching for stuff by name is brittle, and the namespace can get cluttered. Moreover, the variable was not initialized right away, you had to use global_variables_initializer() and so on. Most of the problems with TensorFlow 1.x came from this design based on global scopes. The same is true of collections, and other parts of TF 1.x. All of this global-scope mess is going away in TF 2, that's what they meant by "no more globals".
In TF 2, when you write v = tf.Variable(0.0), you are effectively creating a regular Python object. If v goes out of scope, the variable will be garbage-collected, just like any regular Python object. It's quite all right to use tf.Variable this way.

For the simplicity of the code examples in the notebook, I use global variables rather than wrapping everything in functions or classes, but of course in a real project, it would be much cleaner to create classes to hold these variables, typically by creating a custom Keras layer. You would build the variables in the build() method (using self.add_weight() which eventually creates a tf.Variable), and then use them in the call() method.

I hope this helps. If you are still troubled by some of the code examples, please don't hesitate to tell me which ones, this notebook is quite new so it may still contain some errors or unclear code. That said, these notebooks have been reviewed by the TensorFlow team, and they have even been added to the latest Deep Learning VM images on Google Cloud Platform, so hopefully most of the content should be correct.

Hope this helps!

odysseus0 commented 5 years ago

Thank you so much for your detailed explanation! That makes things so much clearer for someone who just get started with TensorFlow, not even with experience with the TF 1.X version.

So to make sure that I understand the point here, the global that gets removed is the global scope of tf.Variable in the computation graph. Now a tf.Variable object will have a scope equal to its object scope in Python.

Sorry for making uneducated opinion and thank you again for kind correction.

ageron commented 5 years ago

You're most welcome, @odysseus0. :)

Yes, you are exactly right, the global that gets removed is the global scope of tf.Variable within the default graph. Whenever you created a tf.Variable in TF 1.x (or a tf.constant, or just any TF operation), it would get added to the default graph, by name. If another op already had the same name, TF would automatically add an index to ensure the name is unique. For example:

>>> tf.__version__
'1.12.0'
>>> v1 = tf.constant(1., name="v")
>>> v2 = tf.constant(2., name="v")
>>> v3 = tf.constant(3., name="v")
>>> del v1
>>> del v2
>>> del v3
>>> tf.get_default_graph().get_operations()
[<tf.Operation 'v' type=Const>, <tf.Operation 'v_1' type=Const>, <tf.Operation 'v_2' type=Const>]
>>> tf.get_default_graph().get_operation_by_name('v_2')
<tf.Operation 'v_2' type=Const>

The fact that operations are automatically added to a global scope was confusing. Plus, this renaming business was pretty bad, especially if you had several libraries adding operations and potentially using conflicting names. And the fact that operations outlive their Python scope (the ops survive even after I deleted v1, v2 and v3) could get really confusing. To make matters worse, it was impossible to delete anything from a graph. All you could do was to reset the whole graph:

>>> tf.reset_default_graph()

In contrast, look at how much nicer things are in TensorFlow 2:

>>> v1 = tf.constant(1.)
>>> v2 = tf.constant(2.)
>>> v3 = tf.constant(3.)
>>> del v1
>>> v1
Traceback[...]NameError: name 'v1' is not defined
>>> tf.get_default_graph()
Traceback[...]AttributeError: module 'tensorflow' has no attribute 'get_default_graph'
>>> tf.reset_default_graph()
Traceback[...]AttributeError: module 'tensorflow' has no attribute 'reset_default_graph'

Basically, the code is just regular Python, with regular logic and scopes. When a Python object is not referenced anymore, it is garbage collected, end of story. There's no default graph, and we don't need to worry about name clashes. Life is so beautiful, sometimes! :)

By default, every operation just runs immediately and the outputs are returned straight away. This is called eager execution (or eager mode) and it is the default in TensorFlow 2. But you can also convert your Python code to computation graphs automatically using tf.function() (it will speed up this function, so it is pretty similar to just-in-time compilation):

>>> @tf.function # convert the Python function to a TensorFlow function based on a graph
... def add_5(some_arg):
...     return some_arg + 5.
...
>>> add_5(tf.constant(10.))  # we get the result right away
<tf.Tensor: id=20, shape=(), dtype=float32, numpy=15.0>
>>> concrete_function = add_5.get_concrete_function(tf.TensorSpec(shape=[], dtype=tf.float32))
>>> concrete_function
<tensorflow.python.eager.function.ConcreteFunction at 0x12efa40b8>
>>> concrete_function.graph
<tensorflow.python.framework.func_graph.FuncGraph at 0x12f048f98>
>>> concrete_function.graph.get_operations()
[<tf.Operation 'some_arg' type=Placeholder>,
 <tf.Operation 'add/y' type=Const>,
 <tf.Operation 'add' type=Add>,
 <tf.Operation 'Identity' type=Identity>]

To generate the graph, TensorFlow traced the Python function, meaning it ran it in graph mode. In this mode, operations don't actually execute any computation, they just get added to a graph. Moreover, tf.function will also apply autograph, which will analyze the Python source code to find for loops and if statements (and a few other things), and it will add them to the graph. Note that the argument name (some_arg) is used as the name of the Placeholder operation in the graph. Whenever you call the add_5() function, the Placeholder's value will be set to the value of the some_arg argument, and TensorFlow will evaluate the output operation (Identity) and return that.

Wow, I'm going way too far! I hope I didn't confuse you with all these details. Hope it helps!

kettenfett commented 5 years ago

That was a great explanation.

odysseus0 commented 5 years ago

@ageron This is certainly the best explanation I have read about this key difference between TF 1 and TF 2 so far on the internet. You are truly an amazing explainer! I am sure that your new book on TF 2 will be a huge success. Really want to read it myself already.

ageron / tf2_course

No more globals. No more `tf.Variable` #8