Open wild-endeavor opened 2 years ago
@wild-endeavor problem is what if the process crashes? The SUCCESS file is already supported by co-pilot. It is not reliable. SYS_PTRACE allows you to know when the process has exited, potentially without writing the task.
After copilot is done uploading the outputs, the process should exit. The raw container plugin should be changed to monitor the sidecar container (if it doesn't already). Usual task-timeouts apply. this is already the case.
Also improve documentation a bit. Documentation should state precisely what the contract is between the raw container, and the Flyte system. If it's necessary to configure the cluster a certain way in order to run this task type we should state so clearly.
Already an issue for this.
We also need to update docs and examples if we are going with the SUCCESS
file convention.
Pushing to 1.0.1 for now for further discussion.
@pingsutw , can you leavea comment explaining why this was closed?
@eapolinario Spotify is still using sys_ptrace in the raw container, so we keep it in flytecopilot. btw, we already have file access mode (Upload the literal once the SUCCESS file is present) in copilot. These two modes (file access and sys_ptrace) are configurable now.
Revive https://github.com/flyteorg/flyteplugins/pull/264/ and introduce the ability to configure the watcher type in the flytecopilot config.
What do you think of making the registration of raw containers optional?
We register it as a task type here, but we could provide a flag to control the inclusion of raw-container
as a valid task type.
Motivation: Why do you think this is important?
We've been getting more questions like this in the open-source community.
Is there a way we can improve the operation a bit. Why do we need to have special permissions?
Getting inputs into the system: Run a light-weight image as an init container - it should do nothing except run flytecopilot or something and just download the inputs.pb file and deserialize them. (Existing behavior where primitives are unpacked into
{{ .inputs.x }}
should also continue.)Getting outputs out of the system: The user-container should write one of the following:
The sidecar container should continue to be there. But instead of monitoring the process, it should monitor the outputs folder, which should be cross-mounted.
After it detects one of the two output files, it should begin a timer. After some amount of time, if an empty file named
SUCCESS
isn't present, it should fail the task.Once the
SUCCESS
file is present, it should (serialize and) upload the file to the right location.If the task doesn't have an output it should still write the
SUCCESS
file to indicate that it's finished and copilot would upload an empty outputs.pb file (if that's what the system needs).After copilot is done uploading the outputs, the process should exit. The raw container plugin should be changed to monitor the sidecar container (if it doesn't already). Usual task-timeouts apply.
Also improve documentation a bit. Documentation should state precisely what the contract is between the raw container, and the Flyte system. If it's necessary to configure the cluster a certain way in order to run this task type we should state so clearly.
Goal: What should the final outcome look like, ideally?
see above
Describe alternatives you've considered
see above.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?