assafelovic / gpt-researcher

GPT based autonomous agent that does online comprehensive research on any given topic
https://gptr.dev
MIT License
12.96k stars 1.6k forks source link

Multi-Agents will have redundant content across different sections #548

Open DandinPower opened 1 month ago

DandinPower commented 1 month ago

Hello, I have tried the multi-agent implementation found in the multi-agents folder. I also read the multi-agent blog first. I discovered that even though each subsection focuses on different topics, it is easy for the content to overlap and discuss the same concepts across different subsections. As a result, the final report often contains a lot of redundant content, which is not useful at all.

I initially tried to add guidelines like "Each subsection must not have redundant content across different subsections." However, since the reviewer and reviser agents only work on individual subsections, the guidelines can only be applied at the subsection level, not to the entire report. Consequently, the reviewers and revisers are unaware of what is happening in other subsections, making it difficult to resolve the redundancy problem.

I am considering a solution where we have a chief reviewer and reviser. After each subsection completes its research, the chief reviewer and reviser would evaluate the final research across all subsections, ensure there is no redundant content, and provide revision directions to each subsection group to restart their research.

I think this kind of workflow will make the whole process more complex, increase token waste, and cause higher latency. However, I believe that if we can set global guidelines, such as "Each subsection must not have redundant content across different subsections," it can improve the final report's robustness and usefulness.

assafelovic commented 1 month ago

Hey @DandinPower that's a great discovery and definitely a huge improvement to the experience. At the moment (mostly for cost purposes) indeed the report does not take into consideration previous sub topics. To enable this every iteration would need to see the entire report generated before that which would be very costly in terms LLMs calls. I guess it's a tradeoff but definitely something we can consider adding as an option

DandinPower commented 4 weeks ago

Hello @assafelovic , Thank you for your response! I understand that the current multi-agent workflow already incurs significant costs from LLM calls, and it's always important to consider the trade-off between cost and utility. However, I have a preliminary idea: perhaps we could sync with the parallel research group after each draft iteration. First, we could allow the chief reviewer to view all drafts, then apply "report-level guidelines" to create dynamic guidelines for each subsection reviewer. For instance, if both Section A and Section B cover a certain concept, the chief reviewer could instruct one reviewer with a dynamic guideline like "Do not cover [this concept]." After this addition, the original reviewer would use both the standard and dynamic guidelines to provide notes to the revisor. Following this workflow, the additional cost would be just one LLM call per review iteration, with a slight increase in latency for synchronization. This approach also enables the implementation of "entire report" level guidelines without having to restart the subsection research repeatedly, which could otherwise waste numerous LLM calls. I will attempt to develop an agent flow based on this concept and compare the average cost and latency with current workflow to ensure the costs are acceptable. If you're interested, I would be happy to share the results with you!

antoremin commented 3 weeks ago

Hey, I played with the researcher and noticed same issue. IMO having the tradeoff is well worth it, since the alternative is to remove redundant sections by hand. Maintaining coherence across the entire report seems paramount for real-world applications and complex subjects. Maybe it could be an option so users could choose. Excited about possible architecture iterations in this direction and would love to help!

DandinPower commented 3 weeks ago

Hello @antoremin, I have forked the repo and written a brief draft about the modification design. If you are interested, you are welcome to check the multi_agent_v2 folder for details !

assafelovic commented 3 weeks ago

Hey @DandinPower would love help with a PR for this! Currently working on a new front end UX/UI

DandinPower commented 3 weeks ago

Hello @assafelovic, thank you for your invitation! I am willing to help with a PR after I finish the feature about the workflow I mentioned before.

assafelovic commented 3 weeks ago

@DandinPower @antoremin Thought about it deeper and I think there's a pretty decent approach that can work well: Basically we can save all written content in a vector db and for every new section retrieve relevant information from it and consider it when generating new content. This way we're not overloading redundant content in the context for each section generation and it will most likely solve the issue. Wdyt?

DandinPower commented 3 weeks ago

Hello @assafelovic @antoremin,

I think this is a very good approach to reduce redundant content across different subsections. It also allows for generating content in complete parallel since we can retrieve the data independently. Additionally, moving extra LLMs calls into embedding models and vector searches is generally quicker and cheaper. However, I have some concerns about using the vector similarity-based approach:

  1. How to set the similarity threshold to ensure the subsection content is covered enough.
  2. How to set the similarity threshold to ensure it doesn't cover other subsection content.
  3. If we use each subsection draft topic to retrieve relevant content, since the subsections are all related to the same topic, it is likely that all three "subsection draft topic" embeddings in the vector space will be very close. This makes it difficult to use the vector similarity-based method to retrieve the correct content we want.

image

In this figure, it shows that the three subsection topic embeddings (red points) are close to each other, and the other gray points are all written content chunks. The shallow yellow circles represent the relevant range based on the similarity threshold we set. This demonstrates the difficulty in finding a perfect way to retrieve the data without redundancy while getting all the content we want.

assafelovic commented 2 weeks ago

This is a great point! But consider that what we want is the LLM to take previously written sections into consideration when writing the next sections. So it might be enough to get similar chunks just for reasoning of generating new content. I'd be happy to see how we can push this forward anyone want to take a stab at it? @antoremin @DandinPower

DandinPower commented 2 weeks ago

Hey @assafelovic @antoremin, Sorry for the mix-up earlier. Here’s my updated take on your idea: We’re looking at creating content section by section. For each part, we’ll base the writing on the gpt_researcher’s detailed report, its topic, and some subsection guidelines. Plus, we’ll pull in similar content from previous sections to avoid repeating ourselves. This method keeps things simple and cost-effective. We won’t need to feed all previous sections into the next one, which would otherwise make things way more complex and expensive. By tackling each section one at a time and only pulling in a bit of relevant info from before, we get good results with fewer calls to the embedding model, even if it takes a bit longer overall. It might take more time, but it’s worth it for a high-quality, non-redundant final product.

If I’ve got this right, I’m up for helping push this idea forward. Here’s a quick plan on how we could tweak the current multi_agent workflow:

  1. After planning each subsection, we get researchers to do depth_report research in parallel for the first draft.
  2. Start the subsection writing process. We’ll get a reviewer to check the draft and look at previous relevant results by pulling content from the embedding storage. With the draft and relevant contents, the reviewer will give notes for the revisor, who will make changes based on those notes (similar to the current workflow). Once the reviewer confirms it meets all guidelines, we can split the report into chunks, call the embedding model, and save to embedding storage for future subsections.
  3. Once all sections are done, we can use the original workflow to write and publish the final report. This is just my initial idea on how to implement this. Do you think it’s better to handle redundant content directly at the revisor stage, or do you have any other thoughts?
danieldekay commented 5 days ago

This sounds like a good plan, @DandinPower!

I have experience with converting transcripts of interviews, which might jump between topics, into a coherent report that keeps details (unlike a "summarize this meeting" prompt would).

My workflow was:

  1. Chunk the interview into pieces (as it can be hours long, and we use GPT-3.5-turbo).
  2. Create headlines
  3. Create a structure based on the headlines.
  4. Then split into chunks and write each chapter using the transcript chunked from a vectorDB.
  5. Then compile all into final report.

There is still some redundancy, but not as much, as if not using the vectordb, and token efficiency increases a lot.

In your proposed workflow I could imagine the editor (or someone else) gathering abstract summary knowledge from the researchers, and writing a guided outline with headlines, and what facts/questions need to be addressed where. This outline could be chunked and processed separately with knowledge from the vectorDB.

assafelovic commented 5 days ago

This sounds great guys, who is helping with leading this PR? :)

DandinPower commented 5 days ago

@danieldekay Thank you for sharing your experience and workflow! Your approach with interview transcripts is quite interesting and offers some valuable insights.

@assafelovic I'm willing to take the lead on this PR and implement the improvements we've been discussing. Is there anything specific you'd like me to focus on or consider as I develop this feature?

danieldekay commented 4 days ago

@DandinPower, LangGraph also has a human-in-the-loop feature, and we might also want to ask a human on the highest abstraction level of the report structure to provide editorial feedback. This could be what it takes to bring from 60% quality to 85%.

DandinPower commented 4 days ago

@danieldekay That sounds like a good feature! In my opinion, I think there are two approaches to incorporate human feedback into the workflow: (1) After the planner outlines each subsection topic, we allow humans to provide high-level feedback on the direction of each section. (2) After each subsection revisor finishes its revision, not only can the reviewer give review notes, but it can also ask humans to provide direction for further revision. Perhaps we can include a configuration option to let users decide whether they want to activate human feedback or not.

assafelovic commented 4 days ago

@DandinPower ping me on Discord if you'd like or we can open a channel for this feature and invite whomever would like to contribute/test it. Generally take in mind that GPT Researcher is in change of generating a research report based on research tasks, and the long detailed report leverages it multiple times. I assume the logic should be backend/report_type/detailed_report/ path. Looking forward and excited to see what comes out of it!

DandinPower commented 4 days ago

@assafelovic Okay, I'll ping you on Discord later! Thanks for your guidelines. I will make sure I understand the detailed report logic before pushing forward with the progress.

danieldekay commented 2 days ago

@DandinPower , I am also reading the STORM paper (https://arxiv.org/pdf/2402.14207) which has loads more insights into possible processes.

DandinPower commented 2 days ago

@danieldekay Hey! Thanks for introducing this paper. I think the current GPT researcher's detailed report type is also inspired by this paper. I will take a look at it.