Request for Release of Enhanced Theorem-Proving Dataset

deepseek-ai / DeepSeek-Prover-V1.5

MIT License

227 stars 24 forks source link

Request for Release of Enhanced Theorem-Proving Dataset #2

Open PrithwishJana opened 3 months ago

PrithwishJana commented 3 months ago

Hi,

The paper looks impressive! Is there a plan to release the training dataset? I noticed that you used an enhanced theorem-proving dataset with 9,645k sequences, derived from DeepSeek-Prover-V1. Will the new dataset, including the natural language descriptions and intermediate tactic state information, be made available?

Additionally, I would greatly appreciate it if you could share the smaller dataset of 4.5k carefully selected instances used for reinforcement learning.

fzyzcjy commented 2 months ago

+1 Thanks deepseek for the great work, I would appreciate it if the dataset could be open sourced!

aldopareja commented 1 month ago

would also love seeing this or a way of synthetically generate the data. Thank you! and great work indeed.