Open DandinPower opened 1 month ago
Hey @DandinPower that's a great discovery and definitely a huge improvement to the experience. At the moment (mostly for cost purposes) indeed the report does not take into consideration previous sub topics. To enable this every iteration would need to see the entire report generated before that which would be very costly in terms LLMs calls. I guess it's a tradeoff but definitely something we can consider adding as an option
Hello @assafelovic , Thank you for your response! I understand that the current multi-agent workflow already incurs significant costs from LLM calls, and it's always important to consider the trade-off between cost and utility. However, I have a preliminary idea: perhaps we could sync with the parallel research group after each draft iteration. First, we could allow the chief reviewer to view all drafts, then apply "report-level guidelines" to create dynamic guidelines for each subsection reviewer. For instance, if both Section A and Section B cover a certain concept, the chief reviewer could instruct one reviewer with a dynamic guideline like "Do not cover [this concept]." After this addition, the original reviewer would use both the standard and dynamic guidelines to provide notes to the revisor. Following this workflow, the additional cost would be just one LLM call per review iteration, with a slight increase in latency for synchronization. This approach also enables the implementation of "entire report" level guidelines without having to restart the subsection research repeatedly, which could otherwise waste numerous LLM calls. I will attempt to develop an agent flow based on this concept and compare the average cost and latency with current workflow to ensure the costs are acceptable. If you're interested, I would be happy to share the results with you!
Hey, I played with the researcher and noticed same issue. IMO having the tradeoff is well worth it, since the alternative is to remove redundant sections by hand. Maintaining coherence across the entire report seems paramount for real-world applications and complex subjects. Maybe it could be an option so users could choose. Excited about possible architecture iterations in this direction and would love to help!
Hello @antoremin, I have forked the repo and written a brief draft about the modification design. If you are interested, you are welcome to check the multi_agent_v2
folder for details !
Hey @DandinPower would love help with a PR for this! Currently working on a new front end UX/UI
Hello @assafelovic, thank you for your invitation! I am willing to help with a PR after I finish the feature about the workflow I mentioned before.
@DandinPower @antoremin Thought about it deeper and I think there's a pretty decent approach that can work well: Basically we can save all written content in a vector db and for every new section retrieve relevant information from it and consider it when generating new content. This way we're not overloading redundant content in the context for each section generation and it will most likely solve the issue. Wdyt?
Hello @assafelovic @antoremin,
I think this is a very good approach to reduce redundant content across different subsections. It also allows for generating content in complete parallel since we can retrieve the data independently. Additionally, moving extra LLMs calls into embedding models and vector searches is generally quicker and cheaper. However, I have some concerns about using the vector similarity-based approach:
In this figure, it shows that the three subsection topic embeddings (red points) are close to each other, and the other gray points are all written content chunks. The shallow yellow circles represent the relevant range based on the similarity threshold we set. This demonstrates the difficulty in finding a perfect way to retrieve the data without redundancy while getting all the content we want.
This is a great point! But consider that what we want is the LLM to take previously written sections into consideration when writing the next sections. So it might be enough to get similar chunks just for reasoning of generating new content. I'd be happy to see how we can push this forward anyone want to take a stab at it? @antoremin @DandinPower
Hey @assafelovic @antoremin, Sorry for the mix-up earlier. Here’s my updated take on your idea: We’re looking at creating content section by section. For each part, we’ll base the writing on the gpt_researcher’s detailed report, its topic, and some subsection guidelines. Plus, we’ll pull in similar content from previous sections to avoid repeating ourselves. This method keeps things simple and cost-effective. We won’t need to feed all previous sections into the next one, which would otherwise make things way more complex and expensive. By tackling each section one at a time and only pulling in a bit of relevant info from before, we get good results with fewer calls to the embedding model, even if it takes a bit longer overall. It might take more time, but it’s worth it for a high-quality, non-redundant final product.
If I’ve got this right, I’m up for helping push this idea forward. Here’s a quick plan on how we could tweak the current multi_agent
workflow:
depth_report
research in parallel for the first draft.This sounds like a good plan, @DandinPower!
I have experience with converting transcripts of interviews, which might jump between topics, into a coherent report that keeps details (unlike a "summarize this meeting" prompt would).
My workflow was:
There is still some redundancy, but not as much, as if not using the vectordb, and token efficiency increases a lot.
In your proposed workflow I could imagine the editor (or someone else) gathering abstract summary knowledge from the researchers, and writing a guided outline with headlines, and what facts/questions need to be addressed where. This outline could be chunked and processed separately with knowledge from the vectorDB.
This sounds great guys, who is helping with leading this PR? :)
@danieldekay Thank you for sharing your experience and workflow! Your approach with interview transcripts is quite interesting and offers some valuable insights.
@assafelovic I'm willing to take the lead on this PR and implement the improvements we've been discussing. Is there anything specific you'd like me to focus on or consider as I develop this feature?
@DandinPower, LangGraph also has a human-in-the-loop feature, and we might also want to ask a human on the highest abstraction level of the report structure to provide editorial feedback. This could be what it takes to bring from 60% quality to 85%.
@danieldekay That sounds like a good feature! In my opinion, I think there are two approaches to incorporate human feedback into the workflow: (1) After the planner outlines each subsection topic, we allow humans to provide high-level feedback on the direction of each section. (2) After each subsection revisor finishes its revision, not only can the reviewer give review notes, but it can also ask humans to provide direction for further revision. Perhaps we can include a configuration option to let users decide whether they want to activate human feedback or not.
@DandinPower ping me on Discord if you'd like or we can open a channel for this feature and invite whomever would like to contribute/test it. Generally take in mind that GPT Researcher is in change of generating a research report based on research tasks, and the long detailed report leverages it multiple times. I assume the logic should be backend/report_type/detailed_report/
path. Looking forward and excited to see what comes out of it!
@assafelovic Okay, I'll ping you on Discord later! Thanks for your guidelines. I will make sure I understand the detailed report logic before pushing forward with the progress.
@DandinPower , I am also reading the STORM paper (https://arxiv.org/pdf/2402.14207) which has loads more insights into possible processes.
@danieldekay Hey! Thanks for introducing this paper. I think the current GPT researcher's detailed report type is also inspired by this paper. I will take a look at it.
Hello, I have tried the multi-agent implementation found in the
multi-agents
folder. I also read the multi-agent blog first. I discovered that even though each subsection focuses on different topics, it is easy for the content to overlap and discuss the same concepts across different subsections. As a result, the final report often contains a lot of redundant content, which is not useful at all.I initially tried to add guidelines like "Each subsection must not have redundant content across different subsections." However, since the reviewer and reviser agents only work on individual subsections, the guidelines can only be applied at the subsection level, not to the entire report. Consequently, the reviewers and revisers are unaware of what is happening in other subsections, making it difficult to resolve the redundancy problem.
I am considering a solution where we have a chief reviewer and reviser. After each subsection completes its research, the chief reviewer and reviser would evaluate the final research across all subsections, ensure there is no redundant content, and provide revision directions to each subsection group to restart their research.
I think this kind of workflow will make the whole process more complex, increase token waste, and cause higher latency. However, I believe that if we can set global guidelines, such as "Each subsection must not have redundant content across different subsections," it can improve the final report's robustness and usefulness.