Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset

bransGl commented 3 years ago

I have found https://arxiv.org/pdf/2011.09860.pdf Neural Abstract Reasoner research. They has claims tabout 80% accuracy on this dataset. @fchollet How do you think are they realy got 80% accuracy on a such hard dataset? Look sucpitious to me. Can't find code, submissions on kagle, or references on their paper.

jmmcd commented 3 years ago

"As this work is still in progress, these are preliminary results evaluated on grids up to 10×10."

I guess this 80% is only on the grids of small sizes, which are disproportionately the "easy" tasks involving mere rotation/mirroring.

enceladus2000 commented 2 years ago

Has anyone tried implementing this paper? I can't find a working demonstration anywhere.

hassanshallal commented 2 years ago

Hi, I worked on ti for a few months 2020. I haven’t read that article yet thought.

Hassan

On Feb 27, 2022, at 7:25 AM, Tanmay Bhonsale @.***> wrote:

Has anyone tried implementing this paper? I can't find a working demonstration anywhere.

— Reply to this email directly, view it on GitHub https://github.com/fchollet/ARC/issues/82#issuecomment-1053571272, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESS5ZAEYMANMLPRTTMLONTU5IX5TANCNFSM4Z75RC6Q. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.

Sebastian-0 commented 2 years ago

I think there are several strange things about their paper.

In a later presentation (2021) they say "NAR achieves 61.13% accuracy on the Abstraction and Reasoning Corpus" with no further explanation. Is that measure on the entire dataset (i.e. all grid sizes)? Otherwise, why is number different? They use the exact same graphs as motivation in their poster as in the old article, yet with different numbers as the result, see: https://eucys2021.usal.es/computing-03-2021/
As far as I understand it they evaluate on the public test set, yet they compare it to the Kaggle competition which ran on completely different, hidden, tasks.
They claim to solve 78,8% of 100 hidden tasks but don't explain how they get .8 when the test are binary.
There is no discussion on what the impact is when excluding all larger grids, this is especially relevant when they are comparing agains the Kaggle competition.
There is no source code, no one (AFAIK) has reproduced the results, and there is no official benchmark against the hidden test set.

It's possible they have devised an approach that is better than the previous state-of-the-art, but at this point I find it hard to take their numbers at face value.

hassanshallal commented 2 years ago

Thank you for sharing your thoughts. I share perspective on some of these points and also noticed the papers used data augmentation. I am not convinced that using deep learning in any shape or form can tackle escalating levels of generalization from local (robustness), to broad (flexibility), and finally to extreme generalization.

On Mar 3, 2022, at 2:05 AM, Sebastian Hjelm @.***> wrote:

I think there are several strange things about their paper.

In a later presentation (2021) they say "NAR achieves 61.13% accuracy on the Abstraction and Reasoning Corpus" with no further explanation. Is that measure on the entire dataset (i.e. all grid sizes)? Otherwise, why is number different? They use the exact same graphs as motivation in their poster as in the old article, yet with different numbers as the result, see: https://eucys2021.usal.es/computing-03-2021/ https://eucys2021.usal.es/computing-03-2021/ As far as I understand it they evaluate on the public test set, yet they compare it to the Kaggle competition which ran on completely different, hidden, tasks. They claim to solve 78,8% of 100 hidden tasks but don't explain how they get .8 when the test are binary. There is no discussion on what the impact is when excluding all larger grids, this is especially relevant when they are comparing agains the Kaggle competition. There is no source code, no one (AFAIK) has reproduced the results, and there is no official benchmark against the hidden test set. It's possible they have devised an approach that is better than the previous state-of-the-art, but at this point I find it hard to take their numbers at face value.

— Reply to this email directly, view it on GitHub https://github.com/fchollet/ARC/issues/82#issuecomment-1057826203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESS5ZDPV257IPBCBZDNDALU6B6GLANCNFSM4Z75RC6Q. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.

fchollet / ARC-AGI

Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset #82