Closed Cdingram closed 6 days ago
We made a temporary fix that should solve your issue. It's been released on version 1.14.
Let us know if it works now. Thanks for taking interest in our library and for reporting this bug.
I have this issue when using the ScriptCreatorGraph
when the page being accessed and passed to the LLM is very long e.g.:
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 228910 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
Is the same fix needed for ScriptCreatorGraph
as well?
ScriptCreatorGraph
is the only graph that does not support chunking at the moment. If the request is too long, it just won't work. I don't know what was the design principle behind this limitation, but unfortunately it's there.
Thanks very much. I had a look at the GenerateScraperNode
module and compared it with the GenerateAnswerNode
module. I can see that, as you say, GenerateScraperNode
simply doesn't support chunks. (Currently the ScriptCreatorGraph
won't attempt to provide it with chunks anyway).
I suspect the reason for this may be that, whilst chunking up the content on a page and asking the LLM to convert each chunk to structured data would work (as you can then just combine each of the results together into a single result and return it to the user), the same approach doesn't work when generating a script as you would then be left with one script per chunk, with potentially each script being different if the structure of the page in each chunk was to be different.
However, I think it may be worth trying a few different approaches to solve this:
GenerateScraperNode
to use the first chunk, and accept that the script may not cater for data that falls outside of that chunk if it is formatted differently. I think in many cases this would work fine, as in a simple example of a large table, long set of comments, etc, the HTML structure of the later parts of the page is probably very similar to the earlier parts.Do those sound like they could work? I am happy to try them out when I have some free time.
hi @tm-robinson if you want you ca update the generate scraper node
@VinciGit00 I've added a PR for the simpler way of fixing GenerateScraperNode. Will work on the more complex solution at some point soon hopefully.
Closing this as fixes (both permanent and temporary) were published both on beta and on stable.
Describe the bug When doing some crawls, I get the following error:
Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 129936 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
To Reproducefrom scrapegraphai.graphs import SearchGraph
graph_config = { "llm": { "api_key": <openai_key>, "model": "gpt-4o-mini" }, "max_results": 6 }
search_graph = SearchGraph(prompt="Get me urls to pepe wearing a tux memes", config=graph_config)
result = search_graph.run()
Don't ask it was a customer query lol.