google-deepmind / code_contests

Apache License 2.0
2.04k stars 199 forks source link

Dataset Update. AlphaCode 2 #37

Open IvanSedykh opened 5 months ago

IvanSedykh commented 5 months ago

Thanks you for your amazing work.

Is there any chance that the dataset would be updated, like mentioned in the AlphaCode 2 report?

Thank you.

felixgimeno commented 5 months ago

There are no current plans to release a dataset and/or update the github repo as far as I know. Thank you for the kind words.

barnett-yuxiang commented 5 months ago

AlphaCode 2's impressive results in competitive programming are attributed to several key techniques and strategies:

Massive Sampling and Filtering: AlphaCode 2 starts by generating about a million code samples for each problem. It then applies a filtering process to eliminate code samples that either don’t compile, don’t produce the expected output, or don’t match the problem description. This filtering process removes approximately 95% of the initially generated samples.

Clustering Algorithm: From the remaining samples, AlphaCode 2 employs a clustering algorithm. This algorithm groups semantically similar code samples, which are then executed on new test inputs generated by a separate model. The outputs form a signature used for clustering, thereby reducing redundancy.

Scoring Model: A second model, Gemini Pro, is fine-tuned to predict the accuracy of the code samples. This model assigns a correctness score to each code sample in the remaining clusters. The best candidate from each cluster is selected based on this score for submission.

Evaluation on Codeforces: The system's performance was evaluated on the Codeforces platform, where it competed in 12 recent contests involving over 8,000 participants. AlphaCode 2 solved 43% of the competition problems, a significant improvement over the original AlphaCode, which solved 25% of the problems. This performance places AlphaCode 2 above the 85th percentile, ranking between 'Expert' and 'Candidate Master' categories on Codeforces.

Efficiency and Performance: AlphaCode 2 is notably more sample efficient than its predecessor. It requires only about 100 samples to reach the level of performance that the original AlphaCode achieved with a million samples. However, despite its efficiency and high performance, AlphaCode 2 is still computationally intensive and too costly to operate at scale.

Use of Dynamic Programming: AlphaCode 2 is capable of dynamic programming, a technique that breaks down a complex problem into simpler sub-problems. This capability is especially important for solving programming problems that involve complex mathematics and theoretical computer science.

Powered by Gemini: AlphaCode 2 is powered by Gemini or a variant of it (Gemini Pro), which is fine-tuned on coding contest data. This underlying model contributes significantly to the system's overall performance and capabilities.

Future Prospects: While AlphaCode 2 represents a substantial improvement in AI-driven coding, there remains a lot to be done before such systems can reliably match the performance of the best human coders. The current model requires substantial trial and error and depends heavily on filtering out bad code samples. There are ongoing efforts to further enhance these capabilities, potentially using more advanced models like Gemini Ultra.