csabaiBio / elte_ml_journal_club

Machine learning journal club
https://csabaibio.github.io/elte_ml_journal_club/
6 stars 0 forks source link

suggestions #2

Closed icsabai closed 2 years ago

icsabai commented 4 years ago

Erre kíváncsi lennék:

Exploring Weight Agnostic Neural Networks Tuesday, August 27, 2019 Posted by Adam Gaier, Student Researcher and David Ha, Staff Research Scientist, Google Research, Tokyo https://ai.googleblog.com/2019/08/exploring-weight-agnostic-neural.html

icsabai commented 2 years ago

OpenAI has a new version of DALL-E which we reviewed last year.

Nichol A, Dhariwal P, Ramesh A, Shyam P, Mishkin P, McGrew B, Sutskever I, Chen M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741. 2021 Dec 20.

The model is smaller (only 3.5 billion parameters :-) and performs better than DALL-E . It uses "Guided Diffusion Models" which has it's roots in non-equilibrium statistical physics. The background is in these papers:

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. and Ganguli, S., 2015, June. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (pp. 2256-2265). PMLR.link

Ho, J., Jain, A. and Abbeel, P., 2020. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239.

These would be nice reads for one of the upcoming seminars!

TandemElephant commented 2 years ago

There were some concerns regarding NNs only "interpolating" between the shown data points, and thus not being able to really generalize. This paper claims to prove the contrary:

Learning in High Dimension Always Amounts to Extrapolation: https://arxiv.org/pdf/2110.09485.pdf

icsabai commented 2 years ago

After a quick look at the paper @TandemElephant sent in the previous post I would somewhat qualify the statement. They not really saying positively, that NNs can generalize, rather call the attention to the problem of extending the "intuitive" 1-2 dim concept of interpolation/extrapolation. It is well known (but often forgotten) that in very large dimensional spaces "there is no volume" or "everything is on the surface" and low dim analogies are often misleading. Hence, as they argue, new points in the test set will be almost surely outside of the convex hull of the training set, so in the strict sense we always extrapolate. I still have to read the paper in more details (or hope that some of you will and present :-) but as far as I see they stick tot the "convex hull" concept. That may be correct in a mathematical sense, but may not capture the true essence of interpolation vs. extrapolation. My physicist instinct says that instead we should use e.g. the distance from the closest training set point (maybe w.r.t. to the average such distance internally for the training set points). Another measure that may work for continuous not that high dimensions (e.g. for the projected embedding-spaces that they mention) would be the bounding box of k-nearest neighbors instead of the convex hull. (We used infer example for our photometric redshift catalog for SDSS, but that was just 5D. It turned out be a good metric to infer the reliability of the predictions). With not much work we could re-do the analysis they have presented and if works, write a "response" paper.

Any volunteers?

ozkilim commented 2 years ago

@icsabai I am interested!

icsabai commented 2 years ago

It seems that the "interpolation" paper is so widely discussed that everything that can be thought related to that has already said:

icsabai commented 2 years ago

For me this seems quite evident that the number of parameters in a model is depending not only the number of interpolated points but also on the input dimensions. Now there is a NIPS paper and a Quanta summary:

https://www.quantamagazine.org/computer-scientists-prove-why-bigger-neural-networks-do-better-20220210/

Bubeck, S. and Sellke, M., 2021. A universal law of robustness via isoperimetry. Advances in Neural Information Processing Systems, 34.

https://openreview.net/forum?id=z71OSKqTFh7

There may be more to it, I have just read the summary and abstract. Volunteers welcome to read the paper in details and present at one of the seminars.