The bitter lesson - Githubissues

erodola / DLAI-s2-2021

Teaching material for the course of Deep Learning and Applied AI, 2nd semester 2021, Sapienza University of Rome

35 stars 5 forks source link

The bitter lesson #3

Open noranta4 opened 3 years ago

noranta4 commented 3 years ago

Before diving into the Deep Learning & Applied AI course, it is worth challenging your motivations in taking this journey by reading the following post by Richard Sutton, one of the fathers of Reinforcement Learning:

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

With which points do you agree (if any) and with which you do not (if any) of his vision of research in Artificial Intelligence?

Why do you wish to take a whole course on deep learning rather than following a couple of tutorials to learn how to train deep nets in practice?

If the answer is "because I want to understand rather than gaining mere technical competence about modern AI", in light of Sutton's post, why do you want to understand deep learning methods rather than GPUs working principles?

mikcnt commented 3 years ago

I'll try to break the ice, giving my point of view about the words written by Sutton and about the importance of a university course in deep learning, in contrast with best-practices tutorials and so on.

I think that the point Sutton is trying to make is very interesting. What he's suggesting is that, somehow, brute-forcing our way into science can be more productive than human knowledge injection. While the second strategy could intuitively seem the best one, he provided important examples that suggest us that, in fact, that is not.

I think there are different reasons why this is the case, the first one being that we're working and playing with a relatively young science. For this particular reason, I think that a great portion of the scientific community is still trying to find the best way to do actual research in this field, and the best way has probably not been found yet. Brute forcing science could be our very way of plunging in the darkness.

I'm not sure if what Sutton is describing is what everybody should be trying. As everything else, it should probably be a mix of both things: human knowledge, and a good amount of bruteforce. Anyway, I'm pretty sure that to get the right answer we should wait a couple (probably more than that) of years.

Let's now talk about the need for a university course. I obviously think that a whole course on deep learning is much more useful than some tutorial out there, otherwise I wouldn't be here writing this :P The main reason behind that is, actually, the very same thing I wrote before. We're facing something really new, and while DL can be used as a very practical tool, tutorials will only get you so far. In order to create something, you should at least know how things work.

fedeloper commented 3 years ago

I agree with almost all points, but I want to say that is an incomplete vision. Nowadays we know surely that computational methods perform better than knowledge based methods, but when we are using this algorithms we can lose the focus, I will explain better: when we are developing some ML systems we are extracting information from past data, we have to know that in a real social context can be unfair, for example in Deep learning based system to predict if one candidate is going to be hired link or in autonomous driving where the life of people are important link. In my opinion, the recently created ethical guidelines are the starting point for taking care of this possible problem.

In a more technical point of view I am very interested in RuleML, it can be used where is fundamental to know the rules behind one decision/prediction. I am taking this course to know deeply deep network (sorry for the word pun), and use the knowledge gained to investigate the rules extraction from trained network.

fedeloper commented 3 years ago

I'm not sure if what Sutton is describing is what everybody should be trying. As everything else, it should probably be a mix of both things: human knowledge, and a good amount of bruteforce. Anyway, I'm pretty sure that to get the right answer we should wait a couple (probably more than that) of years.

To me in the lecture Richard is saying all the attempts to bring human knowledge in models are waste of time because if it can get better performance is only until the computatonal power are going to get (and it are going to get surely) a little bit stronger

ch-pat commented 3 years ago

I believe it's too early to draw the conclusion that

building in how we think we think does not work in the long run

and that

this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress

the fact that brute force and learning methods are currently outpacing algorithms in which we inject our knowledge does not mean that this will keep holding true. We might just be missing some key insight on how to reason about chess (for example).

Therefore I think it's good that we have scientists working on both sides, as we do not know which method will ultimately be the "best". This is not necessarily "inhibiting further progress".

Two more points I'd like to add:

one could argue that computational power is not infinite, while the premise of the bitter lesson seems to rely on that.
learning methods may produce the best results for certain tasks, but offer no insight nor explanation on how to reach such results. This is unsatisfying for our curiosity first and foremost, and if I dare say so, coming to rely exclusively on learning methods could stop us from reaching the aforementioned "key insight" (if it exists).

senad96 commented 3 years ago

For me, learning is one of the most mysterious concepts in science in general, along with consciousness and reasoning. We are able to understand how the stars work inside them, to understand the first moments of the universe almost 14 billion years ago and yet we cannot understand what happens when we read a book, how we organize information in our mind or how we can understand more basic concepts or things like watching a movie with subtitles. Obviously we must consider the concept of intelligence as a physical and natural process that must be investigated by mathematical ways. So, assuming the basic idea that our brains work by computing inputs and doing operations on these signals (which is true to believe) it remains to understand how this information is processed. I agree with R. Sutton when he said that pure combinatorial exploration is not actually true intelligence, however I believe that combinatorial exploration is a form of intelligence (not all), in fact for that specific task DeepBlu beats the champion Kasparov and in the eyes of an outsider that machine can be considered smart for a chess battle. The difficult thing to model, in my opinion, is the concept of intuition. However, to do this we must define what intuition is in a given situation (or problem), which we do not yet know what it is. Perhaps Reinforcment Learning, that models agents capable of deciding certain actions could give us an answer to this question.

I find the last words of R. Sutton's speech extremely interesting: "We want artificial intelligence agents that can discover like us, not that contain what we have discovered". This in my opinion should be the purpose of AI and the direction we need to take. In the end, learning how to use a gpu or getting an online course in Ai (or deep learning) does not make you understand what you are modeling, what could be the improvements of one model over another, what it means to mathematically apply a certain algorithm with respect to another. Actually, the difference is that an in-depth course in artificial intelligence (in my case a master's degree) does not consider deep learning only as a tool to perform a task, but considers it, with the appropriate knowledge, as an approach that attempts to explain what intelligence is.

korovev commented 3 years ago

I think here Sutton raises a very good point (and an existential crisis or two) and in general I agree on his view. I understand the conflict that brings to a harsh distinction between what is human and what is computational power ("These two [human knowledge and computation] need not run counter to each other, but in practice they tend to." cit.) that in the end might not even be there. In trying to seek a solution for a problem, humans create art along the path and that's what motivates scientists from all the fields to work (would it be the same if Munch just said "i have anxiety" instead of painting his famous Scream?) i.e. the human "has" to be involved in order to be satisfied (thanks to this though, a lot of the things we discovered, we did by searching for others e.g. the cosmic background radiation). On the other hand, are we sure we know how human knowledge really works? What if the "functions" our brain "uses" are the same exact "statistical methods" an MDP uses and we are just bad at formalizing or recognizing it (scary example of a function that both our brain and the universe "apply") thus introducing bias ("the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation" cit.)? We could think of ourselves as a deep model taking some input, spitting some output not knowing exactly what happened inside, so an "AI agents that can discover like we can, not which contain what we have discovered" might be the n-th tool we can exploit to understand us.

Why do you wish to take a whole course on deep learning rather than following a couple of tutorials to learn how to train deep nets in practice?

Because new knowledge comes from rearranging notions in creative ways, and the deep (:P) understanding of the building concepts and reasons is what gives the freedom to play with it and find new solutions (and new problems).

why do you want to understand deep learning methods rather than GPUs working principles?

Because I'm happy with the abstraction of it that the creators of the frameworks give me, just like a GPU-only enthusiast might be happy with the abstraction of a deep framework.

francescoconti748 commented 3 years ago

Sutton raises a good point: "we want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done."

I am going against the tide, maybe I'll sound provocative and flaming: "we are dwarfs standing on the shoulders of giants" (Bernardo di Chartres), so that we can see further away, certainly not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up.

Our learning and generalization capalilities are strongly based on our history and our models for physic laws, that are strictly task-based. Humans do not learn with high-computation and data, but with good past priors (indeed, we are highly biased). A good "human-prior" should be spent whenever is possible, AI should stand on our shoulders.

I am attending these lectures for this reason: I want to stand on your shoulders to see further. PS: GPUs do not have shoulders.

LeonardoEmili commented 3 years ago

In my opinion, the question is: what's the role of the AI researcher when carrying out a research project?

I agree with what Sutton says about the fact that the new, massive, computational resources enabled researchers to get better results. But how is it possible? Nowadays, we rely on more sophisticated models that allow us to obtain outstanding results in many machine learning-related tasks and, in some cases, to open to new horizons that were previously unknown.

the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.

I think that it happens because our reasoning must be led by our knowledge. Just as an example, if we look back in the 50s, researchers were experimenting with new complex approaches to tackle the automatic translation task in Natural Language Processing (NLP). They often consisted of an intricated set of rules that eventually incorporated the meaning of the sentences. Soon after, these ideas were over overwhelmed by the use of statical methods that were instead leveraging massive amounts of data jointly with the availability of great computational resources.

this always helps in the short term, and is personally satisfying to the researcher

On the other hand, we should always keep in mind that innovation is carried out by researchers, not by machines. I believe that our role is to develop techniques to teach machines to solve complex problems without pretending to teach them how they should learn new things. In fact, it is often the case that we don't know the approach we should use to tackle a given problem, and in order to solve it, we open to machine learning techniques that may operate differently from what we as humans expect.

Personally, I decided to take a full course on deep learning since I am deeply interested in this topic, and in particular to understand what are the motivations and concerns when using deep learning models.

Tommysayl commented 3 years ago

As a student who just started studying these topics, I percieve these words as very harsh, and I therefore want to try to raise a few arguments against some parts of the "bitter lesson".

To begin with, less than a week ago, a Professor showed us how "mere" statistics have a learning limit in many nlp applications; he did so bringing the example of several languages that could not be deciphered, because of the lack of explicit semantics associated to the symbols; I think that this is a good example of a case where these methods need some kind of prior guidelines. Moreover, if I correctly understand Sutton's argument where he talks about Moore's law, we should keep in mind that transistors have a dimensional lower bound that cannot be exceeded. I guess this means that, at some point, we can't rely only on computational power to solve these problems.

Edit: One more thought: In the process of doing science, "results" are not necessarily the only valuable output that the researcher aims at; infact, we should also consider the knowledge that the developed model/experiment gave back to him, as often our goal is to gain a better understanding of the phenomenom of interest. I think there is therefore be some kind of trade-off between performance and interpretability of models, even though the interest researchers would have in the first or second characteristic is task-dependant.

AlessioLuciani commented 3 years ago

The consequences of Moore's law have been revolutionizing the world of computing for decades. They carry several advantages that should not be neglected. In the case of AI, researchers should take this aspect into account, besides expanding their knowledge of the subject.

Even though it's true that playing chess purely leveraging experience cannot be considered a good strategy, having the computational capabilities to observe many game situations is still a practical way of beating human players and it should not be underestimated.

Therefore, in my opinion, gaining further knowledge about a subject and trying to understand the underlying logic is meaningful for healthy developments of AI, but in the meantime exploiting the value of big data and powerful processing units can be seen as a smart move.

Concepts seen from a human perspective can be very complex and not objective. An individual may focus on a particular detail while another has not even considered it in the process of recognition. I think that AI agents should be optimized to find relevant features and patterns belonging to the object. This way, the irrelevant complexity can be excluded both delivering a simpler model of reality and a more effective one.

Following an entire course about Deep Learning allows you to learn how to analyze a given problem and develop an effective solution that exploits knowledge of the core fundamentals of neural networks. In light of Sutton's post, only leveraging GPU's computing power is a shortsighted move whose aim is far from understanding the main features of whatever concept.

VaibhavSingh-Resfeber commented 3 years ago

With which points do you agree (if any) and with which you do not (if any) of his vision of research in Artificial Intelligence?

As a beginner in this field, I don't have any conclusion yet but after reading this "The Bitter Lesson", I have understood some part and agreed with Sutton's Points. How we think doesn't work in long term. And most promise point in this lesson is search and learning method which is really important because science are getting changing more frequently so old ideas definitely wouldn't work in all areas.

Why do you wish to take a whole course on deep learning rather than following a couple of tutorials to learn how to train deep nets in practice? Some tutorial will give me partial knowledge which would not be beneficial long term if i need to do research in this field. Definitely i would need deep knowledge to apply in real time.

edodema commented 3 years ago

With which points do you agree (if any) and with which you do not (if any) of his vision of research in Artificial Intelligence?

I was and still am a supporter of a theoretical approach over a brute force one, for this reason at first I was surprised by Sutton's affirmations and quite disappointed, anyway he said only facts for which there is not much to disagree with. However it seems to me that he acknowledges that big data and computational power are the winner of a war that never started - or at least never had to, software and hardware should collaborate instead, and I don’t think it is the case.

He cites Moore’s law, it doesn’t seem to me something to rely on since time has showed us how Moore’s prediction where overly positive, anyway it is comprehensible given that people was galvanised by IT’s industry successes and every day seemed a little closer to a science fiction. Today is happening the same thing to the point that “artificial intelligence”, “deep learning” and “big data” have been reduced to mere buzzwords. Huge sums are invested each day on better hardware or bigger datasets but we should not lose the focus on research for two main reasons:

This is quite vulgar but the right algorithm optimisation could make services way cheaper and accessible to everybody.
The second one is more abstract, even if is not necessarily true let’s assume that machines can be in fact intelligent as humans, I'd like to do a comparison of machine's cognitive development and humans one. From a quantitative point of view do you think you learned more during infancy or adulthood? I’d say the first and, considering that AI is quite young and in its "developmental years", seems natural to me that results relying on pestering a baby with information are successful as human children tend to absorb enviromental knowledge.

We’ll have to see in the future how the game changes, if it does.

Why do you wish to take a whole course on deep learning rather than following a couple of tutorials to learn how to train deep nets in practice?

I decided to follow a course for one main reason: I know nothing of deep learning. A building is based on solid foundations and nothing lasts more than theory, I will eventually study the technicalities and cutting edge developments with online courses but with a good background it will be a downhill road.

nefelitav commented 3 years ago

I have to admit that it seems quite paradoxical and also dissatisfying to me that brute force gets to be more effective than knowledge-based algorithms .However, I don’t think that R.Sutton ‘s intention is to discourage AI researchers or to downgrade human knowledge. Additionally , it is natural that humans reflexively and unconsciously tend to simulate human intelligence and since AI is a relatively young field, this simulation isn’t always ideal (e.g. bias , complexity).I strongly believe that the purpose of the Bitter Lesson is to make us realize that the progress of AI comes when we try to think out of the box and not when we create a machine in our own image. I agree to the notion that a university course can’t be compared to tutorials ,because learning isn’t about just storing knowledge ,but also receiving the adequate stimuli to be able to doubt,to research,to think freely,to understand and this process takes time and effort. Finally, I would like to discover new ways that Deep Learning could be used to serve humanity.

sh3rlock14 commented 3 years ago

🔴 Disagree on: • These two (leverage human knowledge of domain and leverage computational power) need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other.

• As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive and a colossal waste of researcher's time.

• Building in our discoveries only makes it harder to see how the discovering process can be done.

🟠 Somehow dis/agree on: • We have to learn the bitter lesson that building in how we think we think does not work in the long run. • The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, instead we should build in only the meta-methods that can find and capture this arbitrary complexity.

🟢 Agree on: • Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. • … breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.

• We want AI agents that can discover like we can, not which contain what we have discovered.

Even if it is true that the exploration of such a deep field is time and resource consuming, considering the two approaches as “opposite”, might be misleading and counterproductive. Since it is not the case that only few people work on these research projects, rather a whole community, we should consider the effort of exploring all the possible paths as the proper solution to ensure that, at the very end, we would have reached the “right answer” to our problems, whichever they were. What if at a certain point the computational power would not be sufficient or what if we ran into an unknown problem? I personally think that to overcome such problems, the right way would be find a common ground so that, once reached an unsurmountable situation for the computational power alone, the complexity of the human reasoning would take the lead and figure it out the proper solution for a specific situation. Alas, since the goal would be the one cited in the last point, I think that passing a certain amount of human knowledge directly to the machine is (up to now) fundamental: probably we are not ready yet to let the machine do all the work we have done so far during this period of growth, discovery, and progress, not only as beings capable of reasoning and problem solving, but also as social animals.

Behind these concerns, there is the whole reason that should convince everyone aiming to conduct whatever research on this field to join a university course, rather than a (probably) more direct and practical but “dryer” tutorial, in the sense that it would most likely ignore/bypass the whole context in which deep learning methods should fall in (i.e., our society and its multiple, complex facets).