Open boranhan opened 2 weeks ago
Yes, please can you share the scores or the private leaderboard percentile rank which were eventually used for the claimed number for outperforming 50% of humans. Are the python files in https://github.com/WecoAI/aideml/tree/main/sample_results enough to reproduce those numbers? I understand that these numbers are an average over 12 submissions.
@ZhengyaoJiang this was also requested earlier in https://github.com/WecoAI/aideml/issues/4#issuecomment-2138567087
Even sharing some raw results would be helpful to understand the performance of WecoAI. I checked OpenAI's MLE-bench but that doesn't seem to report any numbers either.
I think you can try to submit the best solution to Kaggle to confirm the performance/leaderboard percentile after running the aide application
I tried submitting based on the provided code, it gets me extremely bad results. Running the aide application for all the competitions would also require compute resources and rather than reproducing the results, it would be more easy, and best if you could share either the code or the scores/percentiles based on your experiments.
For example for the competition https://www.kaggle.com/competitions/tabular-playground-series-apr-2021/ by using the code, I made this submission (https://www.kaggle.com/code/anirudhdagar/aide-solution-tabular-playground-series-apr-2021) only to get a private lb score of 0.70524 (which places me 1192/1250 on the leaderboard) which means only 4.64%.
I tried submitting based on the provided code
this is an example output, might be different from the final submission to Kaggle, also you can try with different large language models and other metrics in the config
Hello, thank you for your team's great work.
I'm wondering if you can provide the private leaderboard percentiles for each individual competition?
Thanks in advanced!
Boran