Small rant on the inertia of AI research

LifeIsStrange commented 5 years ago

Hi! This is not an issue per se and can be closed.

First of all, thank you for advancing progress in deep learning.

I'm just a random guy that want to implement an AGI (lol) and like many Nlp engeeners, I need HIGHLY accurate neural networks for fundamental NLP tasks (e.g POS tag, NER, dep parsing, Coref resolution, WSD, etc) They are all not very accurate (often sub 95% F1 score) and their errors add up.

Such limitations make Nlp not yet suitable for many things. This is why improving the state of the art (which can be observed on paperswithcode.com) is a crucial priority from academicians.

Effectively, many researchers have smart ideas to improve the state of the art and often slightly improve it by: Having a "standard neural network" for the task and mix with it their new fancy idea.

I talk from knowledge, I've read most papers from state of the art leaderboards from most fundamental NLP tasks. Almost always they have this common baseline + one idea, theirs. The common baseline sometimes slowly evolve (e.g now it's often a pre trained model (say BERT) + fine tuning + their idea.

Sorry to say, but "this" is to me retarded Where "this" mean the fact that by far, most researchers work in isolation, not integrating others ideas (or with such a slow inertia). I would have wished that state of the art in one Nlp task would be a combination of e.g 50 innovative and complementary ideas from researchers. You are researchers, do you have an idea why that is the case? If someone actually tried to merge all good complementary and compatible ideas, would they have the best, unmatchable state of the art? Why facebookresearch, Microsoft, Google don't try the low hanging fruit in addition to producing X new shiny ideas per month, actually try to merge them in a coherent, synergetic manner?? I would like you to tell me what you think of this major issue that slow AI progress.

As an example of such inertia let's talk about Swish, Mish or RAdam : Those things are incredibly easy to try and see "hey does it give to my neural network free accuracy gains?" Yet not any paper on state of the art leaderboards has tried Swish, Mish or RAdam despite being soo simple to try (you don't need to change the neural network) Not even pre trained models where so many papers depend on them (I opened issues for each of them).

Thank you for reading.

digantamisra98 commented 5 years ago

Hi. Thanks for this comment and I would be happy to put forth my view in regards to the same. Firstly, I don' tag myself a researcher neither do I represent any big player in the research scene especially in the domain of Machine/ Deep Learning and I mostly indulge in experimenting and exploring Mathematics. I strongly agree that collaborative research where using all the best available parameters will result in the epitome of SOTA but you see, this domain has prompted big players to self induct themselves into a sort of competition where everyone is after getting max credits and to use a research outcome of another major competitor doesn't sometimes go well (This is what I think happens). I had a meeting with my colleagues couple of days ago and we are competing for the world leaderboards for the CIFAR 10 and CIFAR 100 classification as mentioned in Paperswithcode.com and we are planning to combine the best of everything (Ranger Optimizer(by Geoffrey Hinton) + Efficient Net(by Google) + CoordConv Layers(By Uber) + Mish activation). I believe collaborative research is the way forward. Indeed I agree sometimes even seemingly small factors like an activation function is overlooked because of the notion of knowledge sharing. Mostly all courses and resources talk about ReLU, Leaky ReLU and Tanh. Hardly anyone knows of activation functions like Swish, Mish and SQNL existing. There are many factors involved here but I'm glad someone could voice that and I'm trying my best to build something using the best of all worlds but then there are many limitations poised against me as I'm an individual with limited resources. But thank you for this. I hope my reply made sense.

Best Diganta

LifeIsStrange commented 5 years ago

@digantamisra98 Well your answer really pleased me, thank you too!

Especially: I didn't knew of SQNL. And we are planning to combine the best of everything (Ranger Optimizer(by Geoffrey Hinton) + Efficient Net(by Google) + CoordConv Layers(By Uber) + Mish activation) This is so awesome to read, I wish you good luck! Don't feel an obligation ^^ but I would like to hear news about this neural network when it's ready, even if it doesn't end up being SOTA. Also, I might tell you about my AGI if I have one day, SOTA results too ^^

HS: I don't know if you're aware but BERT is no longer the generalist state of the art of pre trained models, it has been bested by Xlnet (with Transformer XL): https://paperswithcode.com/paper/xlnet-generalized-autoregressive-pretraining

LifeIsStrange commented 5 years ago

@digantamisra98 BTW this is not the right place for having an off topic discussion (but what is?)

It's not everyday I can speak to someone knowledgeable in AI and maths fields so: I see you're On a Mathematical Adventure! What categories of mathematics interest you the most? Are there any recent advances in the fields of mathematics that you think is worth sharing? I'm interested in mathematics mostly for finding mind tools {helping to improve my rationality (cf: lesswrong.com / rationalwiki.org bayesianist community) or problem solving / visualization skills} or for use cases in programming. I'm also interested in automated theorem checkers (and to a less extent, provers) and in foundations of mathematics / formal logics. And there is something "new" in intersection with all that: homotopic type theory https://en.m.wikipedia.org/wiki/Homotopy_type_theory Did you heard of it?

digantamisra98 commented 5 years ago

@LifeIsStrange Thank you for the appreciation. I'll make a repository where I'll push all the codebase for the CIFAR-10 and 100 experiments. I'm aware of XLNet taking over BERT since a while. I'm mostly interested in Reimann Hypothesis, K-theory, Ring Theory, Graph Theory, Morita Equivalence and their subsets, however, I'm just starting with them. Also, right now I mostly work in general algorithms and complexity. Homotopic Type Theory looks interesting. I'll do some digging up on that. Thanks for providing that information and all the best!

LifeIsStrange commented 5 years ago

@digantamisra98

we are planning to combine the best of everything (Ranger Optimizer(by Geoffrey Hinton) + Efficient Net(by Google) + CoordConv Layers(By Uber) + Mish activation)

Optimizers are evolving really quickly ^^ https://github.com/mgrankin/over9000/blob/master/README.md RangerLars now seems even better than Ranger!

I heard about it on this thread: https://forums.fast.ai/t/meet-mish-new-activation-function-possible-successor-to-relu/53299 There's many valuable optimization tips too.

Maybe you should talk to LessW2020

Also: Did you heard of https://github.com/eBay/AutoOpt ? Might interest you too https://github.com/lessw2020/auto-adaptive-ai

digantamisra98 commented 5 years ago

@LifeIsStrange yes I'm going through all the threads related to Mish and I've dropped Less a message. Hopefully he replies. He also recently beat the world leaderboard for ImageWoof 5 epoch classification test accuracy using Mish and Ranger along with other SOTA algorithms and factors. Find the thread here - https://forums.fast.ai/t/how-we-beat-the-5-epoch-imagewoof-leaderboard-score-some-new-techniques-to-consider/53453

glenn-jocher commented 4 years ago

@LifeIsStrange there is no such thing as 'free' gains. The gains you speak of always come at a compromise. See https://github.com/digantamisra98/Mish/issues/18

Other than that, mixing previous ideas and adding your own is what scientists and researchers clearly already do all around the world every day. Newtown said "If I have seen further it is by standing on the shoulders of giants".

digantamisra98 / Mish

Small rant on the inertia of AI research #7