argilla-io / distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
https://distilabel.argilla.io
Apache License 2.0
1.12k stars 70 forks source link

Deepseek prover task #733

Open plaguss opened 2 weeks ago

plaguss commented 2 weeks ago

Description

⚠️ WIP

This PR implements tasks to replicate the paper: DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data.

Note: The prompts differ with the original implementation as most of prompt formatting is done in via the system prompt. This yielded better results while trying with Llama3 70B.

Examples

Base Example:

from distilabel.steps.tasks import DeepSeekProverAutoFormalization
from distilabel.llms.huggingface import InferenceEndpointsLLM

prover_autoformal = DeepSeekProverAutoFormalization(
    llm=InferenceEndpointsLLM(
        model_id="deepseek-ai/deepseek-math-7b-instruct",
        tokenizer_id="deepseek-ai/deepseek-math-7b-instruct",
    ),
)

Few-shot setting:

prover_autoformal = DeepSeekProverAutoFormalization(
    llm=InferenceEndpointsLLM(
        model_id="deepseek-ai/deepseek-math-7b-instruct",
        tokenizer_id="deepseek-ai/deepseek-math-7b-instruct",
    ),
    examples=[
        "theorem amc12a_2019_p21 (z : ℂ) (h₀ : z = (1 + Complex.I) / Real.sqrt 2) :\n\n((∑ k : ℤ in Finset.Icc 1 12, z ^ k ^ 2) * (∑ k : ℤ in Finset.Icc 1 12, 1 / z ^ k ^ 2)) = 36 := by\n\nsorry",
        "theorem amc12a_2015_p10 (x y : ℤ) (h₀ : 0 < y) (h₁ : y < x) (h₂ : x + y + x * y = 80) : x = 26 := by\n\nsorry"
    ]
)

Scorer:

from distilabel.steps.tasks import DeepSeekProverScorer
from distilabel.llms.huggingface import InferenceEndpointsLLM

prover_scorer = DeepSeekProverAutoFormalization(
    llm=InferenceEndpointsLLM(
        model_id="deepseek-ai/deepseek-math-7b-instruct",
        tokenizer_id="deepseek-ai/deepseek-math-7b-instruct",
    ),
)

Pending tasks:

Closes #732

codspeed-hq[bot] commented 2 weeks ago

CodSpeed Performance Report

Merging #733 will not alter performance

Comparing deepseek-prover (1c2d7fc) with develop (9d6a152)

Summary

✅ 1 untouched benchmarks