ToolUse : How to use tools to scalably chunk output beyond output token limit?

I am writing a module to utilize AI techniques to digest large quantities of information and in a layered stepwise fashion distill it cumulatively into an intermediate structured form that can be used as an input to create a written human rights application.

I started with OpenAI GPT4o but hit issues with their max 4,000 token completion token cap. I then moved to the OpenAI equivalent of the "tools" feature in an effort to have the AI generate the output in "chunks" that, collected together, represent the final output, which may be very long and exceed a model's token completion cap.

I then moved to adapt the script to Anthropic (which is almost complete).

It appears the API wants previous content to be sent back to it in order to complete the "conversation" but this may not scale for me. To summarize two months of evidence, and provide appropriate system content, I am already using 150,000 tokens out of the 200,000 token max context length. I will try to make this work by sending back the most recent tool_use clause that matches the tool_result response and see if the content flows properly into my tool (which saves chunked content to be reassembled later).

Initially, I have the AI in a first pass take detailed content from particular files containing emails and text messages from a particular date range, the output is a summarized structured format. I use the majority of the context window size (system prompt contains legal and medical background documentation, user prompt contains email/text message data) and the output is relatively small.

In pass 2 I take all those pass1 output blocks and cumulatively combine them, meaning that both input and output sizes are expected to grow to be quite large. More accurately, I do token counting and fit as many input blocks as I can while factoring in the expected output token size into the calculation. While Claude's 200,000 context window is certainly better than gpt4o's 128,000 window, I still need a scalable solution to summrize vast quantities of data and not hit any output token limits.

I should have asked : Does Claude-3.5 Sonnet have any restrictions on output/completion max token count?

Is the tools feature the most appropriate feature to use in this case? If so, how can I avoid sending in past context so huge that it quickly consumed the context window?

https://community.openai.com/t/inception-based-design-for-the-ai-assisted-creation-of-a-written-human-rights-complaint/863669

PS. I'm getting the following error because I'm omitting to send back previous context when responding with tool call results:

Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.0: tool_result block(s) provided when previous message does not contain any tool_use blocks'}}

The "tool_use" block contains the most recent chunked output.

Let's say that the final output contains ten chunks, does Claude expect me to give only the most recent chunk sent via a tool_use block or all previous tool_use and tool_result blocks? This sequence is not at all clear to me. Requirement is to provide system/user prompts and have output chunked back to us such that the joined chunks form parts of a single whole response (so the model must somehow maintain state through the initial call followed by the call containing the tool result content.

Also, when providing a tool result, do I also need to send back the initial user and system prompts? If so, the scalability of this approach may be in jeopardy.

anthropics / courses

ToolUse : How to use tools to scalably chunk output beyond output token limit? #14