Open tomasz-sodzawiczny opened 1 year ago
Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏
Describe the bug
FlyteRemote.sync_execution()
(and other methods that rely on it, e.g.FlyteRemote.wait()
andFlyteRemote.execute()
with wait=True) occasionally fails withThe offending line is remote.py#L1644 (I pasted full stack trace in the "Sreenshots" below).
The error only happens while the task is running (and is some specific state). Seems to happen for us mostly when the tasks inside of the conditional take some time to schedule (e.g. tasks with resource requirements that trigger cluster scale-up).
This pretty much renders
wait()
unusable for us, we had do our own wait wrapper that has a try..except around thesync_execution
to handle this case.Expected behavior
It shuold not raise? ;)
Additional context to reproduce
No response
Screenshots
Stack trace:
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?