Closed jeromemassot closed 3 months ago
I attempted to reproduce the problem by specifying and deploying an agent using your sample code, and the deployment worked for me as expected.
Ensure that you are using the same library versions in your development environment (e.g., wherever you are deploying the agent from). This can sometimes happen due to a mismatch between the Python library versions on the client side (i.e., dev environment) vs. server-side (i.e., Reasoning Engine deployed service).
If you still run into the issues after checking that, then you'll want to view your deployment logs in the Logs Explorer in the Console. Try looking there for any error messages related to deployment, and you might try filtering the logs by searching for the Reasoning Engine ID that you see during deployment:
create ReasoningEngine backing LRO: projects/962751530XXX/locations/us-central1/reasoningEngines/3471395703400431616/operations/4369265454217166848
When viewing the logs, you can also verify that the Python version in Reasoning Engine is the same that you expect based on the development environment. Or you can look for other package installation, conflicts, or other issues in the logs. The Python version gets inferred on the client side when deploying, and you can also specify sys_version
as an argument to reasoning_engines.ReasoningEngine.create
if it's failing to auto-detect the correct Python version.
I ran into this issue today as well and then found this GitHub issue. What I did was to look for the logs in Cloud Logging (which didn't appear right away for some reason). I found them via logs query log_id(reasoning_engine%2Fstderr) OR log_id(reasoning_engine%2Fbuild)
. For some reason the logs took a little while to appear and also just searching for logs with resource.type="aiplatform.googleapis.com/ReasoningEngine"
didn't seem to work (maybe a logging bug).
Anyway, once I found logs for my reasoning engine, I saw an error like this:
DEFAULT 2024-08-01T00:37:37.523307Z AttributeError: Can't get attribute '_class_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py'> ERROR 2024-08-01T00:37:41.810830Z Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/code/app/__main__.py", line 28, in <module> main() File "/code/app/__main__.py", line 16, in main app=create_app()
I ran pip show
to get the versions of google-cloud-aiplatform
, langchain-google-vertexai
and langchain-core
, which were the three items inn the requirements
section of my reasoning engine creation call (I was following the example in the these docs).
However, Gemini itself suggested that I might have version skew inn the cloudpickle
package itself. So I ran pip show
on cloudpickle
and added a locked version to my requirements
parameter for reasoning_engines.ReasoningEngine.create
and voila, I got past the error.
@draffensperger, thanks for posting the details of your experience. You are exactly right, if you get a serialization / cloudpickle error like that, it can help to pin the version of cloudpickle==3.0.0 to solve the issue.
I've included that version pin in notebooks such as https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/reasoning-engine/tutorial_vertex_ai_search_rag_agent.ipynb in both the pip install
lines at the top of the notebook, as well as the package versions that get specified in the reasoning_engines.ReasoningEngine.create
options.
We also try to document frequently encountered issues like this one at https://cloud.google.com/vertex-ai/generative-ai/docs/reasoning-engine/troubleshooting/deploy#cloudpickle_version.
Thanks you both for this help regarding the issue with cloudpickle :) I confirm that on my side I had no clear error log but I did not make the same effort as @draffensperger to look for it. Best regards Jerome
I have tried to add the cloudpickle locked version in the code, but I still have an issue.
remote_app = reasoning_engines.ReasoningEngine.create(
agent,
requirements=[
"google-cloud-aiplatform[reasoningengine,langchain]==1.57.0",
"langchain-google-alloydb-pg==0.4.1",
"langchain-google-vertexai==1.0.4",
"cloudpickle==3.0.0"
],
display_name="PrebuiltAgent",
)
It seems that there is a LangChain module missing, this error is triggered by the cloudpickle.loads() call.
{
insertId: "66ab939b0003bc8d2a8876e6"
logName: "projects/education-and-tests-422020/logs/reasoning_engine%2Fstderr"
receiveTimestamp: "2024-08-01T13:54:35.541482941Z"
resource: {2}
severity: "ERROR"
textPayload: "Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/code/app/__main__.py", line 28, in <module>
main()
File "/code/app/__main__.py", line 16, in main
app=create_app(),
File "/code/app/api/app.py", line 54, in create_app
router=python_file_api_builder.PythonFileApiBuilder(
File "/code/app/api/factory/python_file_api_builder.py", line 43, in __init__
self.obj = utils.get_object(self.file_name)
File "/code/app/api/factory/utils.py", line 35, in get_object
obj = get_local_object(obj_filename)
File "/code/app/api/factory/utils.py", line 54, in get_local_object
return cloudpickle.loads(f.read())
ModuleNotFoundError: No module named 'langchain_google_alloydb_pg.engine'"
timestamp: "2024-08-01T13:54:35.244877Z"
}
In fact, I remove all the locks regarding the modules versions and it worked.
remote_app = reasoning_engines.ReasoningEngine.create(
agent,
requirements=[
"google-cloud-aiplatform[reasoningengine,langchain]",
"langchain-google-alloydb-pg",
"langchain-google-vertexai",
"cloudpickle==3.0.0"
],
display_name="PrebuiltAgent",
)
@jeromemassot, thanks for the update and letting us know the resolution. It might be the case that there were some transitive dependencies that were conflicting during the pip install. Glad to know that you were able to get it working!
File Name
tutorial_alloydb_rag_agent.ipynb
What happened?
returns an
InternalServerError: 500 The user created Reasoning Engine failed to start and cannot serve traffic. 13: The user created Reasoning Engine failed to start and cannot serve traffic
when running in an Argolis project.However, the AlloyDB is setup correctly and can be accessed. And other LangChain agents have been deployed in this Argolis project without any issue.
Relevant log output
No response
Code of Conduct