The paper looks impressive! Is there a plan to release the training dataset? I noticed that you used an enhanced theorem-proving dataset with 9,645k sequences, derived from DeepSeek-Prover-V1. Will the new dataset, including the natural language descriptions and intermediate tactic state information, be made available?
Additionally, I would greatly appreciate it if you could share the smaller dataset of 4.5k carefully selected instances used for reinforcement learning.
Hi,
The paper looks impressive! Is there a plan to release the training dataset? I noticed that you used an enhanced theorem-proving dataset with 9,645k sequences, derived from DeepSeek-Prover-V1. Will the new dataset, including the natural language descriptions and intermediate tactic state information, be made available?
Additionally, I would greatly appreciate it if you could share the smaller dataset of 4.5k carefully selected instances used for reinforcement learning.