darinkishore / dspy

Stanford DSPy: The framework for programming with foundation models
MIT License
0 stars 0 forks source link

Sweep: `Signature` prompt skeleton #68

Open darinkishore opened 11 months ago

darinkishore commented 11 months ago

Details

Summary

To enhance the generative capabilities of DSPy, we are proposing the integration of an "immutable" prompt "skeleton" mechanism that mimics DeepMind's FunSearch methodology as described in their paper and illustrated in Extended Data Figure 1. This feature will allow DSPy to merge ideas by sampling different signature variations and prompting the Language Model (LM) to fill in dynamic content within a consistent structural backdrop.

Background

According to DeepMind's research, their prompting structure involved selecting and sorting two programs from a database (referred to as priority v0 and priority v1), to encourage the LM to merge ideas. The prompt also included a new priority function with an empty body (priority v2) for the LM to complete. This process results in the generation of novel programs by leveraging existing knowledge and the generative power of the LM within a fixed format.

Goals

Specifications

Action Items

Additional Context

The implementation should be mindful of the DSPy's Signature class's existing architecture and mechanisms. The immutable prompt skeleton should function effectively with current input/output field constraints. The new feature should lead to the innovation of relevant and diverse solutions within the DSPy framework, leveraging the established work from DeepMind's research.

Relevant Files

To guide the implementation and provide context around how and where the changes might integrate within the DSPy codebase, the following files have been identified as possibly relevant:

Ensure ALL action items are well thought out, thorough, tested, and finished.

Checklist - [X] Modify `dspy/teleprompt/signature_opt.py` ✓ https://github.com/darinkishore/dspy/commit/bfa03c255618d6a409d41584ab1f2605b0040dc7 [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dspy/teleprompt/signature_opt.py#L26-L208) - [X] Running GitHub Actions for `dspy/teleprompt/signature_opt.py` ✓ [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dspy/teleprompt/signature_opt.py#L26-L208) - [X] Create `dsp/modules/signature_sampler.py` ✓ https://github.com/darinkishore/dspy/commit/e9740c8a2e8a82b5a5153d1f44e39a83c021ea5e [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/modules/signature_sampler.py) - [X] Running GitHub Actions for `dsp/modules/signature_sampler.py` ✓ [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/modules/signature_sampler.py) - [X] Create `dsp/modules/module.py` ✓ https://github.com/darinkishore/dspy/commit/a47f75e332ce9ad2394321554d00c67912625fd3 [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/modules/module.py#L1-L100) - [X] Running GitHub Actions for `dsp/modules/module.py` ✓ [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/modules/module.py#L1-L100) - [X] Modify `dsp/primitives/compiler.py` ✓ https://github.com/darinkishore/dspy/commit/e450e00dbb0c0bd86a2eece039714086b8e55cce [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/primitives/compiler.py) - [X] Running GitHub Actions for `dsp/primitives/compiler.py` ✓ [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/primitives/compiler.py) - [X] Create `dsp/retriever/retriever_interface.py` ✓ https://github.com/darinkishore/dspy/commit/136e2080a9902364a49d41c8be24895986bedf11 [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/retriever/retriever_interface.py) - [X] Running GitHub Actions for `dsp/retriever/retriever_interface.py` ✓ [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/retriever/retriever_interface.py) - [X] Modify `dsp/primitives/compiler.py` ✓ https://github.com/darinkishore/dspy/commit/7a1b36b965c9527bd425b50a05dde635c1c612c3 [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/primitives/compiler.py) - [X] Running GitHub Actions for `dsp/primitives/compiler.py` ✓ [Edit](https://github.com/darinkishore/dspy/edit/sweep/signature_prompt_skeleton_1/dsp/primitives/compiler.py)
sweep-ai[bot] commented 11 months ago

🚀 Here's the PR! #77

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: 5bb0f61e79)

Actions (click)

Sandbox Execution ✓

Here are the sandbox execution logs prior to making any changes:

Sandbox logs for bd7a12d
Checking dspy/teleprompt/signature_opt.py for syntax errors... ✅ dspy/teleprompt/signature_opt.py has no syntax errors! 1/1 ✓
Checking dspy/teleprompt/signature_opt.py for syntax errors...
✅ dspy/teleprompt/signature_opt.py has no syntax errors!

Sandbox passed on the latest main, so sandbox checks will be enabled for this issue.


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/darinkishore/dspy/blob/bd7a12dbea4943ecdd0c689f7b62071f629b113f/README.md#L13-L21 https://github.com/darinkishore/dspy/blob/bd7a12dbea4943ecdd0c689f7b62071f629b113f/dspy/teleprompt/signature_opt.py#L26-L208 https://github.com/darinkishore/dspy/blob/bd7a12dbea4943ecdd0c689f7b62071f629b113f/dsp/primitives/compiler.py#L1-L170 https://github.com/darinkishore/dspy/blob/bd7a12dbea4943ecdd0c689f7b62071f629b113f/dsp/templates/template_v2.py#L1-L287 https://github.com/darinkishore/dspy/blob/bd7a12dbea4943ecdd0c689f7b62071f629b113f/dsp/templates/template_v3.py#L1-L71

Step 2: ⌨️ Coding

--- 
+++ 
@@ -25,6 +25,12 @@

 """
 class BasicGenerateInstruction(Signature):
+    placeholders = []
+
+    def __init__(self, placeholders=None, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.placeholders = placeholders if placeholders is not None else []
+
     """You are an instruction optimizer for large language models. I will give you a ``signature`` of fields (inputs and outputs) in English. Your task is to propose an instruction that will lead a good language model to perform the task well. Don't be afraid to be creative."""

     basic_instruction = dspy.InputField(desc="The initial instructions before optimization")
@@ -120,7 +126,7 @@
                     instruction, prefix = c.proposed_instruction.strip('"').strip(), c.proposed_prefix_for_output_field.strip('"').strip()

                     # Set this new module with our instruction / prefix 
-                    p_new.extended_signature.instructions = instruction
+                    p_new.extended_signature.instructions = instruction.format(*p_new.extended_signature.placeholders)
                     p_new.extended_signature.fields[-1] = p_new.extended_signature.fields[-1]._replace(name=prefix)

                     # Score the instruction / prefix 

Ran GitHub Actions for bfa03c255618d6a409d41584ab1f2605b0040dc7:

Ran GitHub Actions for e9740c8a2e8a82b5a5153d1f44e39a83c021ea5e:

Ran GitHub Actions for a47f75e332ce9ad2394321554d00c67912625fd3:

--- 
+++ 
@@ -7,6 +7,7 @@

 import dsp
 from datasets.fingerprint import Hasher
+from dsp.modules.signature_sampler import SignatureSampler

 if os.environ.get('DSP_NOTEBOOK_CACHEDIR'):
     training_data_directory = os.path.join(os.environ.get('DSP_NOTEBOOK_CACHEDIR'), 'compiler')
@@ -159,9 +160,17 @@
     return ft

 # 4. Return updated program.
-def compile(program, examples, target='ada'):
-    training_data = simulate(program, examples)
-    compiled_lm = finetune(training_data, target=target)
+def compile(self, program, examples, k, target='ada'):
+    signature_sampler = SignatureSampler()
+    sampled_signatures = signature_sampler.sample(program.signatures, k)
+    prompt = ''
+    for signature in sampled_signatures:
+        prompt += signature.instructions.format(*signature.placeholders)
+    compiled_prompt = program.compile(prompt)
+    # Here add code to persist the compiled prompt.
+    # This can be writing to a file or storing it in a database.
+    
+    compiled_lm = finetune(compiled_prompt, target=target)

     def compiled_program(*args, **kwargs):
         with dsp.settings.context(compiled_lm=compiled_lm, compiling=False):

Ran GitHub Actions for e450e00dbb0c0bd86a2eece039714086b8e55cce:

Ran GitHub Actions for 136e2080a9902364a49d41c8be24895986bedf11:

--- 
+++ 
@@ -7,6 +7,8 @@

 import dsp
 from datasets.fingerprint import Hasher
+from dsp.retriever.retriever_interface import RetrieverInterface
+from dsp.modules.signature_sampler import SignatureSampler

 if os.environ.get('DSP_NOTEBOOK_CACHEDIR'):
     training_data_directory = os.path.join(os.environ.get('DSP_NOTEBOOK_CACHEDIR'), 'compiler')
@@ -159,9 +161,19 @@
     return ft

 # 4. Return updated program.
-def compile(program, examples, target='ada'):
-    training_data = simulate(program, examples)
-    compiled_lm = finetune(training_data, target=target)
+def compile(self, program, examples, k, target='ada', database=None):
+    retriever = RetrieverInterface(database)
+    retrieved_signatures = retriever.retrieve("")  # Add appropriate query if needed
+    signature_sampler = SignatureSampler()
+    sampled_signatures = signature_sampler.sample(retrieved_signatures, k)
+    prompt = ''
+    for signature in sampled_signatures:
+        prompt += signature.instructions.format(*signature.placeholders)
+    compiled_prompt = program.compile(prompt)
+    # Here add code to persist the compiled prompt.
+    # This can be writing to a file or storing it in a database.
+    
+    compiled_lm = finetune(compiled_prompt, target=target)

     def compiled_program(*args, **kwargs):
         with dsp.settings.context(compiled_lm=compiled_lm, compiling=False):

Ran GitHub Actions for 7a1b36b965c9527bd425b50a05dde635c1c612c3:


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/signature_prompt_skeleton_1.


🎉 Latest improvements to Sweep:


💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request. Join Our Discord