deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.08k stars 650 forks source link

Rename Block to LearnedFunction #63

Closed zachgk closed 3 years ago

zachgk commented 4 years ago

This issue is my proposal to rename the Block class (https://github.com/awslabs/djl/blob/master/api/src/main/java/ai/djl/nn/Block.java) to LearnedFunction. I am hoping to collect feedback and have a discussion with the community about it.

Right now, we use Block as the main class for representing a neural network. We chose Block because it conveyed the idea of composability: that the various Blocks can combine like lego blocks. This addresses the question of how neural networks are build up using small differentiable functions (operators) into a full network.

My concern with Block is that it doesn't convey a sense of freedom. Blocks are more rigid and can only go together in relatively fixed ways. However, the ways Blocks can go together is not quite clear. Are SequentialBlock and ParallelBlock sufficient for everything you need? Can blocks have variable number of children or is it fixed? How does conditionals or loops fit into the analogy?

That is why I am thinking that LearnedFunction might be a clearer representation. It can do pretty much anything a function can do and any programmer should be aware of what functions do. This makes it clear you can do things like composition, call other functions, and use control flow.

It is also a more clearer representation of what the Block class actually represents. The first two paragraphs of the Block javadoc, copied below, clearly show the ideology of a LearnedFunction:

A {@code Block} is a composable function that forms a neural network.

Blocks serve a purpose similar to functions that convert an input NDList to an output NDList. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the functions represented by the blocks get more and more accurate. Each block consists of the following components:

There are also some concerns about this rename. First, the name of Block is used by other frameworks like Gluon (although new TF/Keras use layers, PT uses Module).

The other concern is that LearnedFunction is a more abstract concept than Block. Block, although not a perfectly accurate description, would be easier to understand. This could make it easier for new users to adapt to deep learning with DJL. Using a very abstract concept, on the other hand, would make it more difficult.

Please comment below if you have any other thoughts, ideas, or concerns regarding this. Also, add a reaction to the main description with thumbs up (+1) if you agree with the rename and a thumbs down (-1) if you think it is a bad idea.

roywei commented 4 years ago

IMO, LearnedFunction is too abstract and generic. Although everyone who programs understand what function is, but it will be very confusing as it's a very broad concept and does not help to relate to deep learning. I assume learned function what to describe a function with parameters that can be learned in training. However, it will be confusing for functions that does not have trainable weights, what do they learn? In contrast, block/layer/module is easy to associate with small building blocks/pieces that can be stacked/concatenated to form a medium/larger architecture. Medium constructs can be used repeatedly to form larger architecture. The final architecture can be very deep with many layers/modules. This is one of the specific concept/characteristics that differentiate Deep Learning from traditional machine learning methods.

In addition, we currently can't ask Java users to rely only on our documentation and Java doc to learn about deep learning. If we refer them to some resources like DL on coursera or cs231n, they will know the layer/module concept from PyTorch or TensorFlow. It will be easier to relate to one of the concepts that's similar or same to most deep learning resources.

I'm not against changing the name of Block, but if we change, I'd prefer layer or module.

zachgk commented 4 years ago

@roywei

IMO, LearnedFunction is too abstract and generic. Although everyone who programs understand what function is, but it will be very confusing as it's a very broad concept and does not help to relate to deep learning.

Let me try and explain the reasoning behind how Function relates to DL because it isn't terribly clear. Think about the mathematical function for a minute which has a set of objects in the domain, a set of objects in the codomain, and a map from each element of the domain to an element of the codomain. A normal function in programming is a procedure to go from each domain item to the corresponding codomain item.

A dataset is also a function. Take Imagenet: the images in imagenet form the domain and the possible classes form the codomain. The labels for the data is then the function mappings. When I describe the Block as a LearnedFunction, the dataset is the function that it is learning.

In a more general sense, Imagenet should be a subset of all images classifiable into the 1000 classes. Ideally it would be a representative subset, but that is its own challenge. So, our true goal is really to learn this larger function given only the Imagenet subset. This is why overfitting is a problem - it learns the imagenet function and not the more general image classification function.

However, it will be confusing for functions that does not have trainable weights, what do they learn?

A learned function also should not be thought of as requiring parameters. It can have zero or more parameters. So, something like Flatten could be converted into a learned function. It trains vacuously: it learns all of the parameters it has, but there are no parameters to learn. It is not that you can't do it, but that it might not be useful. Alone it isn't too useful. In conjunction with other LearnedFunctions containing parameters, it can be quite useful.

If we refer them to some resources like DL on coursera or cs231n, they will know the layer/module concept from PyTorch or TensorFlow. It will be easier to relate to one of the concepts that's similar or same to most deep learning resources.

We can mention some of the other possible names in the javadoc for LearnedFunction. I imagine users will be able to understand that different frameworks use different terms for the same idea, especially since every framework uses a different term.


One other improvement, upon further thought, is that it might be better to use LearningFunction instead. Past tense Learned makes more sense for inference, but is a bit odd during training. LearningFunction would do slightly better for both cases overall.

chenkelmann commented 4 years ago

For what its worth, here are my 2¢ about the renaming:

The suggestion is an improvement:

The suggestion is a detriment:

So I think the new naming convention would make sense if really all of the bookkeeping were done automagically for implementers of new classes in standard cases. Ideally it should be as convenient as it is in Googles Swift language extensions: https://tryolabs.com/blog/2020/04/02/swift-googles-bet-on-differentiable-programming/ This encourages rapid implementation and prototyping and prevents possible pitfalls.

So I think the suggestion is a step in the right direction but would require truly rethinking the base API around Block, AbstractBlock, Parameter, ParameterStorage to make it so smooth as to make adding a new LearnableFunction as simple as adding a forward pass and declaring each parameter on a single line.

I have thought about this a lot but could not come up with a workable solution. Frankly, my current pull request for extensions to AbstractBlock still feels a bit hackish, as it still does not solve these problems. Then again in Java it is a bit more complicated to build a truly convenient API, as it still is not as flexible as, say, Kotlin, where it is very easy to build smooth DSLs.

Techniques to further reduce boilerplate and have "real" LearnableFunctions would IMHO require bringing out the Big Guns for DSL design (just ideas, might not work):

For truly novel and out-of-the-box designs the kind of flexibility provided by the current API is important to still have around, but maybe it should be more hidden for everyday cases.

Long story short: I think it is a good recommendation, but in order to be truly useful it should be part of a bigger redesign of the Block lifecycle and workings.

Hence I do not know whether to give it a thumbs up or down...