Netflix / metaflow

:rocket: Build and manage real-life ML, AI, and data science projects with ease!
https://metaflow.org
Apache License 2.0
8.02k stars 752 forks source link

ServiceException (code 500): run_number is out of range for type integer, when running with local metadata #1922

Open Ahmed-Soliman96 opened 1 month ago

Ahmed-Soliman96 commented 1 month ago

When I run a flow:

with Runner(flow_file=os.path.join(script_dir, "flow.py"), metadata="local") as runner:
            result = runner.run(max_num_splits=2000, max_workers=64, kwargs=input_parameters)

with metadata="local" I get this error:

File "/venv/lib/python3.8/site-packages/metaflow/plugins/metadata/service.py", line 471, in _request raise ServiceException( metaflow.plugins.metadata.service.ServiceException: Metadata request (/flows/Flow/runs/1721143286111471) failed (code 500): "{\"err_msg\": {\"pgerror\": \"ERROR: value \\"1721143286111471\\" is out of range for type integer\nLINE 7: ... WHERE flow_id = 'Flow' AND run_number = '172114328...\n ^\n\", \"pgcode\": \"22003\", \"diag\": {\"message_primary\": \"value \\"1721143286111471\\" is out of range for type integer\", \"severity\": \"ERROR\"}}}"

https://github.com/Netflix/metaflow/blob/a617aa816dde0651d481fdfb6bbafe067667dc4a/metaflow/plugins/metadata/service.py#L471

This error never happens when I run without the metadata = "local" parameter.

madhur-ob commented 1 month ago

@Ahmed-Soliman96 can you please paste the contents of your flow file here?

Also, does this occur when you don't use the Runner API? i.e. just with the CLI?

Ahmed-Soliman96 commented 1 month ago

@madhur-ob Thank you for your response. Unfortunately, I cannot paste the flow file content, cause it's a confidential script. But I can guarantee that this error doesn't occur when running the script from CLI, it occurs only with Runner API.

madhur-ob commented 1 month ago

Hi @Ahmed-Soliman96 I wasn't able to reproduce it on my side. From a first glance, it seems like a Postgres issue aka wrong column type. Maybe the schema of the postgres table you use is the issue behind the scenes?

But regardless, I am not asking for the exact contents of the flow file, but something similar perhaps.. which makes use of these parameters: max_num_splits=2000, max_workers=64, amongst others..

Essentially, something reproducible after removing all the confidential parts... which emulates what you are trying to do..