Closed zachgk closed 3 years ago
IMO, LearnedFunction is too abstract and generic. Although everyone who programs understand what function is, but it will be very confusing as it's a very broad concept and does not help to relate to deep learning. I assume learned function what to describe a function with parameters that can be learned in training. However, it will be confusing for functions that does not have trainable weights, what do they learn? In contrast, block/layer/module is easy to associate with small building blocks/pieces that can be stacked/concatenated to form a medium/larger architecture. Medium constructs can be used repeatedly to form larger architecture. The final architecture can be very deep with many layers/modules. This is one of the specific concept/characteristics that differentiate Deep Learning from traditional machine learning methods.
In addition, we currently can't ask Java users to rely only on our documentation and Java doc to learn about deep learning. If we refer them to some resources like DL on coursera or cs231n, they will know the layer/module concept from PyTorch or TensorFlow. It will be easier to relate to one of the concepts that's similar or same to most deep learning resources.
I'm not against changing the name of Block, but if we change, I'd prefer layer or module.
@roywei
IMO, LearnedFunction is too abstract and generic. Although everyone who programs understand what function is, but it will be very confusing as it's a very broad concept and does not help to relate to deep learning.
Let me try and explain the reasoning behind how Function relates to DL because it isn't terribly clear. Think about the mathematical function for a minute which has a set of objects in the domain, a set of objects in the codomain, and a map from each element of the domain to an element of the codomain. A normal function in programming is a procedure to go from each domain item to the corresponding codomain item.
A dataset is also a function. Take Imagenet: the images in imagenet form the domain and the possible classes form the codomain. The labels for the data is then the function mappings. When I describe the Block as a LearnedFunction, the dataset is the function that it is learning.
In a more general sense, Imagenet should be a subset of all images classifiable into the 1000 classes. Ideally it would be a representative subset, but that is its own challenge. So, our true goal is really to learn this larger function given only the Imagenet subset. This is why overfitting is a problem - it learns the imagenet function and not the more general image classification function.
However, it will be confusing for functions that does not have trainable weights, what do they learn?
A learned function also should not be thought of as requiring parameters. It can have zero or more parameters. So, something like Flatten could be converted into a learned function. It trains vacuously: it learns all of the parameters it has, but there are no parameters to learn. It is not that you can't do it, but that it might not be useful. Alone it isn't too useful. In conjunction with other LearnedFunctions containing parameters, it can be quite useful.
If we refer them to some resources like DL on coursera or cs231n, they will know the layer/module concept from PyTorch or TensorFlow. It will be easier to relate to one of the concepts that's similar or same to most deep learning resources.
We can mention some of the other possible names in the javadoc for LearnedFunction. I imagine users will be able to understand that different frameworks use different terms for the same idea, especially since every framework uses a different term.
One other improvement, upon further thought, is that it might be better to use LearningFunction
instead. Past tense Learned makes more sense for inference, but is a bit odd during training. LearningFunction would do slightly better for both cases overall.
For what its worth, here are my 2¢ about the renaming:
The suggestion is an improvement:
The suggestion is a detriment:
ScaledDotProductAttentionBlock
would become ScaledDotProductAttentionLearnableFunction
. Unless of course one drops the naming convention that new LearnableFunction
s append the name of the interface as a suffix. (which might also make sense, it feels a bit like smurf programming). NDArray
s from ParameterStore
s, serialization etc.), which still feels more of an enterprisey "block" with all the bookkeeping involved than a smooth Function which focuses on the forward pass.So I think the new naming convention would make sense if really all of the bookkeeping were done automagically for implementers of new classes in standard cases. Ideally it should be as convenient as it is in Googles Swift language extensions: https://tryolabs.com/blog/2020/04/02/swift-googles-bet-on-differentiable-programming/ This encourages rapid implementation and prototyping and prevents possible pitfalls.
So I think the suggestion is a step in the right direction but would require truly rethinking the base API around Block
, AbstractBlock
, Parameter
, ParameterStorage
to make it so smooth as to make adding a new LearnableFunction
as simple as adding a forward pass and declaring each parameter on a single line.
I have thought about this a lot but could not come up with a workable solution. Frankly, my current pull request for extensions to AbstractBlock
still feels a bit hackish, as it still does not solve these problems. Then again in Java it is a bit more complicated to build a truly convenient API, as it still is not as flexible as, say, Kotlin, where it is very easy to build smooth DSLs.
Techniques to further reduce boilerplate and have "real" LearnableFunction
s would IMHO require bringing out the Big Guns for DSL design (just ideas, might not work):
ThreadLocalStorage
to automatically assign the current correct NDArray
for the current forward pass's Device
NDArray
s from ParameterStorage
s and Serialization LearnableFunction
, determining its size, and setting its initializer are three different steps? If this was all done in the constructor, preferably aided by a convenient API and the serialization were truly fully automated then addin a new LearnableFunction
would be as simple as declaring the Parameter
members and overriding forward
forward
for the most common use cases: the params
are hardly ever used, the ParameterStorage
is always used in the exactly same way. For the Block
to truly be a function, it still carries too much explicit state around when chaining forward
calls.For truly novel and out-of-the-box designs the kind of flexibility provided by the current API is important to still have around, but maybe it should be more hidden for everyday cases.
Long story short: I think it is a good recommendation, but in order to be truly useful it should be part of a bigger redesign of the Block
lifecycle and workings.
Hence I do not know whether to give it a thumbs up or down...
This issue is my proposal to rename the
Block
class (https://github.com/awslabs/djl/blob/master/api/src/main/java/ai/djl/nn/Block.java) toLearnedFunction
. I am hoping to collect feedback and have a discussion with the community about it.Right now, we use Block as the main class for representing a neural network. We chose Block because it conveyed the idea of composability: that the various Blocks can combine like lego blocks. This addresses the question of how neural networks are build up using small differentiable functions (operators) into a full network.
My concern with Block is that it doesn't convey a sense of freedom. Blocks are more rigid and can only go together in relatively fixed ways. However, the ways Blocks can go together is not quite clear. Are SequentialBlock and ParallelBlock sufficient for everything you need? Can blocks have variable number of children or is it fixed? How does conditionals or loops fit into the analogy?
That is why I am thinking that LearnedFunction might be a clearer representation. It can do pretty much anything a function can do and any programmer should be aware of what functions do. This makes it clear you can do things like composition, call other functions, and use control flow.
It is also a more clearer representation of what the Block class actually represents. The first two paragraphs of the Block javadoc, copied below, clearly show the ideology of a LearnedFunction:
There are also some concerns about this rename. First, the name of Block is used by other frameworks like Gluon (although new TF/Keras use layers, PT uses Module).
The other concern is that LearnedFunction is a more abstract concept than Block. Block, although not a perfectly accurate description, would be easier to understand. This could make it easier for new users to adapt to deep learning with DJL. Using a very abstract concept, on the other hand, would make it more difficult.
Please comment below if you have any other thoughts, ideas, or concerns regarding this. Also, add a reaction to the main description with thumbs up (+1) if you agree with the rename and a thumbs down (-1) if you think it is a bad idea.