Nukesor / pueue

:stars: Manage your shell commands.
MIT License
4.68k stars 128 forks source link

feat: use S3 to save std output and error #515

Closed linux-china closed 3 months ago

linux-china commented 3 months ago

A detailed description of the feature you would like to see added.

Now I use pueue to run different scripts to generate CSV files, and these files will be used by DuckDB and ClickHouse. Is possible to ask pueue to write stdout as object in S3 bucket.

object name: pueue_instance/task_id-status.txt object metadata: Task id, Status, Command, Path, Start, End

Explain your usecase of the requested feature

Why this features?

  1. Easy to backup output with S3
  2. Scripts are datasources for data process
  3. Local disk cleanup and some I/O problem
  4. Most data tools use S3 as data storage by default.
  5. Easy to trigger with S3 object creation notification

Alternatives

No response

Additional context

No response

Nukesor commented 3 months ago

Nope, this most definitely doesn't fit into the scope of this project. It's designed for single user on-system usage. There's no need for any cloud service integration.

See https://github.com/Nukesor/pueue?tab=readme-ov-file#design-goals

It seems to be possible to mount S3 buckets to the local file system, so it should be somewhat feasible to do anyway. There won't be official support for it though.

Regarding your statements:

Easy to backup output with S3

In Pueue, logs aren't supposed to be backed up. They're cleaned up whenever a task is removed. For persistent logs and permanent traceability, please look for another tool.

Scripts are datasources for data process

Pueue isn't designed to be used as a scriptable task scheduler. Please don't use it as a task executor in a processing pipeline. There's no official support for this type of stuff and there're better things out there such as slurm.

Local disk cleanup and some I/O problem

Could you elaborate on this point?

Most data tools use S3 as data storage by default.

What kind of data tools are you talking about? Why does log output need to be processed by data tools?

Easy to trigger with S3 object creation notification

Pueue has it's own callback system.