Open drfraser opened 2 years ago
I think something like what I described happened and as the root cause is the bug in my code, I feel this issue ought to be closed. I have tried to duplicate this problem and can't, so the only way might be to manually delete the db-wal file or set up a subtask/flow to do so or to explicitly go into Pending while the main flow ends.
Thanks a ton for the thorough issue and for trying to reproduce this! If you bump up against this again, please reopen this issue and we'll take a look. I'll also reopen if another user encounters this.
It has occurred again - the flow being run was the only live one at the time and it did not even start properly given what the UI says (the subsequent runs of this flow for the next few days have all been fine)
Flow could not be retrieved from deployment.
Traceback (most recent call last):
File "/home/railml/venv/lib/python3.8/site-packages/prefect/engine.py", line 247, in retrieve_flow_then_begin_flow_run
flow = await load_flow_from_flow_run(flow_run, client=client)
File "/home/railml/venv/lib/python3.8/site-packages/prefect/client.py", line 104, in with_injected_client
return await fn(*args, **kwargs)
File "/home/railml/venv/lib/python3.8/site-packages/prefect/deployments.py", line 46, in load_flow_from_flow_run
await storage_block.get_directory(from_path=None, local_path=".")
File "/home/railml/venv/lib/python3.8/site-packages/prefect/filesystems.py", line 98, in get_directory
shutil.copytree(from_path, local_path, dirs_exist_ok=True)
File "/usr/lib/python3.8/shutil.py", line 557, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
File "/usr/lib/python3.8/shutil.py", line 513, in _copytree
raise Error(errors)
shutil.Error: [('/home/railml/etl/orion.db-wal', './orion.db-wal', "[Errno 2] No such file or directory: '/home/railml/etl/orion.db-wal'")]
01:16:05 PM
Thanks for the heads up! I've reopened the issue and I'll give this a more thorough look when I've got a moment.
This happened again this morning, so if you can tell me where to put extra logging statements, I probably can give you more context in a day or two.
This looks like a weird issue related to the temporary creation of a WAL file in the middle of the copytree operation. cc @cicdw
Is this one still in progress or resolved?
I also occurred to me, can you guys help me with this? it happens while my flow already on production
Downloading flow code from storage at 'D:\\PREFECT \\HOME\\DIRECTORY'
06:00:00 AM
Flow could not be retrieved from deployment.
Traceback (most recent call last):
File "C:\PYTHON_VIRTUAL_ENVIRONMENT\DIRECTORY\Documents\prefect_ENV_PRD\env\Lib\site-packages\prefect\engine.py", line 318, in retrieve_flow_then_begin_flow_run
flow = await load_flow_from_flow_run(flow_run, client=client)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PYTHON_VIRTUAL_ENVIRONMENT\DIRECTORY\Documents\prefect_ENV_PRD\env\Lib\site-packages\prefect\client\utilities.py", line 40, in with_injected_client
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PYTHON_VIRTUAL_ENVIRONMENT\DIRECTORY\Documents\prefect_ENV_PRD\env\Lib\site-packages\prefect\deployments.py", line 197, in load_flow_from_flow_run
await storage_block.get_directory(from_path=deployment.path, local_path=".")
File "C:\PYTHON_VIRTUAL_ENVIRONMENT\DIRECTORY\Documents\prefect_ENV_PRD\env\Lib\site-packages\prefect\filesystems.py", line 147, in get_directory
copytree(from_path, local_path, dirs_exist_ok=True)
File "C:\Program Files\Python311\Lib\shutil.py", line 561, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\shutil.py", line 515, in _copytree
raise Error(errors)
shutil.Error: [('D:\\PREFECT \\HOME\\DIRECTORY\\home\\prefect.db-shm', 'C:\\PYTHON_VIRTUAL_ENVIRONMENT\\DIRECTORY\\AppData\\Local\\Temp\\tmpymczar64prefect\\home\\prefect.db-shm', "[Errno 2] No such file or directory: 'D:\\\\PREFECT \\\\HOME\\\\DIRECTORY\\\\home\\\\prefect.db-shm'"), ('D:\\PREFECT \\HOME\\DIRECTORY\\home\\prefect.db-wal', 'C:\\PYTHON_VIRTUAL_ENVIRONMENT\\DIRECTORY\\AppData\\Local\\Temp\\tmpymczar64prefect\\home\\prefect.db-wal', "[Errno 2] No such file or directory: 'D:\\\\PREFECT \\\\HOME\\\\DIRECTORY\\\\home\\\\prefect.db-wal'")]
06:00:08 AM
First check
Bug summary
I am not exactly sure what is the cause of this bug, e.g. I can guess why orion.db-wal might be generated, but no idea what is going on internally.
My flows are running fine, they are pretty simple ones, and this error has occurred only once. Could it be my flows are overlapping in time and something is getting scrambled? So I need more background knowledge before I could help debug this.
The log message below was the only line associated with this run - it is as if at the very start of the flow, it threw an error
Reproduction
Error
Versions
Additional context
The flow before this one was in a bad state - there was a bug in my code that caused a subtask to enter a Pending state while the main flow process ended. I am not sure how Prefect should handle that scenario, but from what I read, Sqlite deletes db-wal files when all database connections have been removed.
What if the first flow ended while the second flow had already started, then the stuck subtask ends and ultimately the db-wal file gets deleted when it shouldn't have been (due to the first flow breaking)? Thus the second flow breaks with this error?