Closed FabioRosado closed 1 year ago
I believe jobs are kind of designed for notebooks right now, but I find them very useful for batch processing-style kind of workloads, specially since they're kind of fire-and-forget so you can have an automated process call coiled.start_job
and just forget about it (with a lambda, for example). These kind of workloads involve running a script (which would be uploaded via the files
argument of coiled.create_job_configuration
), however I think those scripts are hardly static, so having to specify command
in the job configuration feels very unflexible. I'd like to be able to specify command
in coiled.start_job
instead or even being able to upload files with it (in case your script takes in a configuration file instead of arguments).
For example, if I had like an ETL script, it would probably take the path to the input file. So the call to the script would be like python my_etl.py --path s3://mybucket/raw/20210325.parquet
, but with the current API, I'd have to redefine the job configuration to use that path. I think it'd be better to be able to specify either the full command or the arguments when launching the job.
In summary, with the current behavior you have to do this everytime you want to modify a parameter:
coiled.create_job_configuration(
name='etl-20210325',
software='my-etl-software',
command='python my_etl.py --path s3://mybucket/raw/20210325.parquet'.split(),
files=['my_etl.py'],
...)
coiled.start_job('etl-20210325')
If there was a way to send either the command or the arguments, you could reuse your job configuration:
coiled.start_job('etl', command='python my_etl.py --path s3://mybucket/raw/20210325.parquet'.split()) # full command
coiled.start_job('etl', args='--path s3://mybucket/raw/20210325.parquet'.split()) # just the arguments
jobs is deprecated
While chatting with José Morales, he asked if it was possible to pass parameters to the
start_job
command. In specific it would be good to allow users to submit files to this command without the need to recreate a cluster configuration.It would be good to have a Job constructor similar to the Cluster one that would allow us to modify a job's configuration on the fly.