Closed portoaj closed 1 month ago
Oh you have right, we will add it
Hi, I took the liberty to work on this error.
As for the Issue title, the error is encountered when graphs executions are nested into the others. For example in SearchGraph
, the IteratorNode
will call multiple instances of SmartScraperGraph
creating a nested graph structure.
What happens at code level is the following.
Every graph during execution will try to catch all the token informations from OpenAi calls using an OpenAi handler with get_openai_callback() as cb:
.
We end up having the following structure.
# SearchGraph is executed
with get_openai_callback() as cb1:
# SearchInternetNode executed
# Token informations gathered
# IteratorNode executed
# SmartScraperGraph executed
with get_openai_callback() as cb2:
# OpenAi handler gets passed to CB2. CB1 loses the handler (~paused).
# All graph executed
# Token informations gathered by CB2
# OpenAi handler released
# CB1 resume and re-obtain the handler.
# No information about the token is available, has been "consumed" by CB2
To fix the error we can create a CustomContextManager
that manages exclusive access to the OpenAi handler.
get_openai_callback
with yield a None object.get_openai_callback
continue to work`custom_openai_callback.py
import threading
from contextlib import contextmanager
from langchain_community.callbacks import get_openai_callback
class CustomOpenAiCallbackManager:
_lock = threading.Lock()
@contextmanager
def exclusive_get_openai_callback(self):
if CustomOpenAiCallbackManager._lock.acquire(blocking=False):
try:
with get_openai_callback() as cb:
yield cb
finally:
CustomOpenAiCallbackManager._lock.release()
else:
yield None
base_graph.py
self.callback_manager = CustomOpenAiCallbackManager()
[...]
with self.callback_manager.exclusive_get_openai_callback() as cb:
[...]
if cb is not None:
# update exec_info
Result
node_name total_tokens prompt_tokens completion_tokens successful_requests total_cost_USD exec_time
0 SearchInternet 170 161 9 1 0.000030 3.868281
1 GraphIterator 46841 46456 385 5 0.007199 10.237838
2 MergeAnswers 1152 825 327 1 0.000320 3.569431
3 TOTAL RESULT 48163 47442 721 7 0.007549 17.675550
We lose nested graph detailed cost informations, we do not know how the cost inside GraphIterator is divided (5 calls to SmartScraperGraph, that is composed by FetchNode, ParseNode...)
A solution to obtain this kind of detailed information would require more engineering on the CustomOpenAiCallback
.
I can work on this in the next days.
I think that for now it is already good to have at least the complete cost of an execution, so I opened a PR for this Issue, see if you like the proposed solution.
hi, please update to the new version
Describe the bug If you run a graph such as the SearchGraph, the only outputs from the graph_exec_info are from the SearchGraph, but that doesn't include the child SmartScraperGraph instance used by the GraphIteratorNode. Since the GraphIteratorNode is likely using most of the tokens that the model actually needs, this could lead to people massively underestimating how much they're spending on queries/ tokens.
To Reproduce Here's code to reproduce the issue:
** Exec Info output: node_name total_tokens prompt_tokens completion_tokens successful_requests total_cost_USD exec_time 0 SearchInternet 231 213 18 1 0.000043 3.495771 1 GraphIterator 0 0 0 0 0.000000 4.696635 2 MergeAnswers 245 236 9 1 0.000041 0.364202 3 TOTAL RESULT 476 449 27 2 0.000084 8.556608
Expected behavior I'd expect the GraphIterator to show the tokens that it used instead of 0. Alternatively, it should find all of the subgraphs used during the running of this graph and either print those within this graph_exec_info ie. 0 SearchInternet 1 GraphIterator 2 SmartScraperGraph 3...