🚀 Here's the PR! #101

See Sweep's progress at the progress dashboard!

⚡ Sweep Basic Tier: I'm using GPT-4. You have 5 GPT-4 tickets left for the month and 3 for the day. (tracking ID: 88d622c2cc)

For more GPT-4 tickets, visit our payment portal. For a one week free trial, try Sweep Pro (unlimited GPT-4 tickets).

[!TIP] I can email you next time I complete a pull request if you set up your email here!

Actions (click)

[ ] ↻ Restart Sweep

GitHub Actions✓

Here are the GitHub Actions logs prior to making any changes:

Sandbox logs for 6679ea8

Checking docs/getting_started/README.md for syntax errors... ✅ docs/getting_started/README.md has no syntax errors! 1/1 ✓
Checking docs/getting_started/README.md for syntax errors...
✅ docs/getting_started/README.md has no syntax errors!

Sandbox passed on the latest main, so sandbox checks will be enabled for this issue.

Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description.

https://github.com/darinkishore/dspy/blob/6679ea8025741555f9ab6dd3b2ed33ba6e945c71/README.md#L1-L386 https://github.com/darinkishore/dspy/blob/6679ea8025741555f9ab6dd3b2ed33ba6e945c71/docs/getting_started/README.md#L1-L342

Step 2: ⌨️ Coding

[X] Create docs/api_reference/modules/prompt_compression.md ✓ https://github.com/darinkishore/dspy/commit/3da185163c4890d236518ca9fbfc6e2032df613e Edit
Create docs/api_reference/modules/prompt_compression.md with contents:
• Create a new documentation file for the Prompt Compression module. This file should outline the purpose of the module, how it works, and examples of its usage. It should explain how the module condenses prompts to fit within the token limitations of various language models while retaining essential information. Include a section on integrating this module with existing DSPy workflows.
• Update docs/index.rst to include a reference to the new prompt_compression.md in the API Reference section.

[X] Running GitHub Actions for docs/api_reference/modules/prompt_compression.md ✓ Edit
Check docs/api_reference/modules/prompt_compression.md with contents:

Ran GitHub Actions for 3da185163c4890d236518ca9fbfc6e2032df613e:

[X] Create dspy/modules/prompt_compression.py ✓ https://github.com/darinkishore/dspy/commit/f6b510f3c22451dcd895c0d5f59b3e5a7a2011dd Edit
Create dspy/modules/prompt_compression.py with contents:
• Implement the Prompt Compression module. This Python file should define a class `PromptCompression` that inherits from `dspy.Module`. The class should implement methods for condensing input prompts based on summarization or distillation techniques. Ensure the module can be easily integrated into existing DSPy pipelines, with clear methods for input and output that align with DSPy's design philosophy.
• Modify dspy/modules/__init__.py to include the `PromptCompression` module, making it accessible as part of the DSPy framework.

[X] Running GitHub Actions for dspy/modules/prompt_compression.py ✓ Edit
Check dspy/modules/prompt_compression.py with contents:

Ran GitHub Actions for f6b510f3c22451dcd895c0d5f59b3e5a7a2011dd:

[X] Create dspy/compiler.py ✓ https://github.com/darinkishore/dspy/commit/5653709af0d9b21603ae566a0607564cdb18fe1c Edit
Create dspy/compiler.py with contents:
• Integrate the Prompt Compression module within the DSPy compiler logic. This involves modifying the compiler to optionally use the `PromptCompression` module when compiling programs, especially for tasks with lengthy descriptions that exceed the token limitations of the target language model.
• Add logic to the compiler that identifies when the context length might exceed the model's limitations and automatically applies prompt compression. Ensure this feature can be toggled by the user.
• Incorporate principle-based few-shot learning by enhancing the compiler's ability to prioritize and extract key principles or strategies from the input data. This might involve analyzing the input data for patterns or key elements that are crucial for the task and ensuring these are prominently featured in the compiled prompts or few-shot demonstrations.

[X] Running GitHub Actions for dspy/compiler.py ✓ Edit
Check dspy/compiler.py with contents:

Ran GitHub Actions for 5653709af0d9b21603ae566a0607564cdb18fe1c:

[X] Modify docs/getting_started/README.md ✓ https://github.com/darinkishore/dspy/commit/7d463230dd9e2b319e351a48548d9b0fdcae2101 Edit
Modify docs/getting_started/README.md with contents:
• Update the Getting Started guide to include information on the new Prompt Compression module and principle-based few-shot learning enhancements. Provide examples of how these features can be used to address context length limitations and improve the efficiency of few-shot learning with DSPy.
• Highlight scenarios where these features would be particularly beneficial, such as working with datasets with lengthy descriptions or complex scenarios that require distilling essential principles for effective learning.

--- 
+++ 
@@ -13,7 +13,7 @@

 To make this possible:

-- **DSPy** provides **composable and declarative modules** for instructing LMs in a familiar Pythonic syntax. It upgrades "prompting techniques" like chain-of-thought and self-reflection from hand-adapted _string manipulation tricks_ into truly modular _generalized operations that learn to adapt to your task_.
+- **DSPy** provides **composable and declarative modules** for instructing LMs in a familiar Pythonic syntax. It upgrades "prompting techniques" like chain-of-thought and self-reflection from hand-adapted _string manipulation tricks_ into truly modular _generalized operations that learn to adapt to your task_, including the new **Prompt Compression** for efficiently dealing with context length limitations and principle-based few-shot learning to focus on the underlying strategies or principles that are key to success.

 - **DSPy** introduces an **automatic compiler that teaches LMs** how to conduct the declarative steps in your program. Specifically, the **DSPy compiler** will internally _trace_ your program and then **craft high-quality prompts for large LMs (or train automatic finetunes for small LMs)** to teach them the steps of your task.

@@ -88,7 +88,7 @@

 **Your `__init__` method** declares the modules you will use. Here, `RAG` will use the built-in `Retrieve` for retrieval and `ChainOfThought` for generating answers. **DSPy** offers general-purpose modules that take the shape of _your own_ sub-tasks — and not pre-built functions for specific applications.

-Modules that use the LM, like `ChainOfThought`, require a _signature_. That is a declarative spec that tells the module what it's expected to do. In this example, we use the short-hand signature notation `context, question -> answer` to tell `ChainOfThought` it will be given some `context` and a `question` and must produce an `answer`. We will discuss more advanced **[signatures](#3a-declaring-the-inputoutput-behavior-of-lms-with-dspysignature)** below.
+Modules that use the LM, like `ChainOfThought`, require a _signature_. That is a declarative spec that tells the module what it's expected to do. Similarly, our new **Prompt Compression** module offers a straightforward interface for condensing lengthy inputs, ensuring efficiency in contexts with strict token limitations, while principle-based few-shot learning can be leveraged for capturing essential strategies or principles to guide the model's learning. In this example, we use the short-hand signature notation `context, question -> answer` to tell `ChainOfThought` it will be given some `context` and a `question` and must produce an `answer`. We will discuss more advanced **[signatures](#3a-declaring-the-inputoutput-behavior-of-lms-with-dspysignature)** below.

 **Your `forward` method** expresses any computation you want to do with your modules. In this case, we use the modules `self.retrieve` and `self.generate_answer` to search for some `context` and then use the `context` and `question` to generate the `answer`!
@@ -181,7 +181,7 @@
 ```

-Different teleprompters offer various tradeoffs in terms of how much they optimize cost versus quality, etc. For `RAG`, we might use the simple teleprompter called `BootstrapFewShot`. To do so, we instantiate the teleprompter itself with a validation function `my_rag_validation_logic` and then compile against some training set `my_rag_trainset`.
+Different teleprompters offer various tradeoffs in terms of how much they optimize cost versus quality, etc. Including our advancements such as principle-based few-shot learning, which significantly refines the compilation process by focusing on core principles instead of exhaustive details, enhancing learning efficiency. For `RAG`, we might use the simple teleprompter called `BootstrapFewShot`. To do so, we instantiate the teleprompter itself with a validation function `my_rag_validation_logic` and then compile against some training set `my_rag_trainset`.

 ```python
 from dspy.teleprompt import BootstrapFewShot

[X] Running GitHub Actions for docs/getting_started/README.md ✓ Edit
Check docs/getting_started/README.md with contents:

Ran GitHub Actions for 7d463230dd9e2b319e351a48548d9b0fdcae2101:

Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/addressing_context_length_limitations_in.

🎉 Latest improvements to Sweep:

New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request.^{Something wrong? Let us know.}

This is an automated message generated by Sweep AI.

darinkishore / dspy

Addressing Context Length Limitations in DSPy #100

Details