coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Allow users to pass parameters to start_jobs #137

Closed FabioRosado closed 1 year ago

FabioRosado commented 3 years ago

While chatting with José Morales, he asked if it was possible to pass parameters to the start_job command. In specific it would be good to allow users to submit files to this command without the need to recreate a cluster configuration.

It would be good to have a Job constructor similar to the Cluster one that would allow us to modify a job's configuration on the fly.

jose-moralez commented 3 years ago

I believe jobs are kind of designed for notebooks right now, but I find them very useful for batch processing-style kind of workloads, specially since they're kind of fire-and-forget so you can have an automated process call coiled.start_job and just forget about it (with a lambda, for example). These kind of workloads involve running a script (which would be uploaded via the files argument of coiled.create_job_configuration), however I think those scripts are hardly static, so having to specify command in the job configuration feels very unflexible. I'd like to be able to specify command in coiled.start_job instead or even being able to upload files with it (in case your script takes in a configuration file instead of arguments).

For example, if I had like an ETL script, it would probably take the path to the input file. So the call to the script would be like python my_etl.py --path s3://mybucket/raw/20210325.parquet, but with the current API, I'd have to redefine the job configuration to use that path. I think it'd be better to be able to specify either the full command or the arguments when launching the job.

In summary, with the current behavior you have to do this everytime you want to modify a parameter:

coiled.create_job_configuration(
  name='etl-20210325',
  software='my-etl-software',
  command='python my_etl.py --path s3://mybucket/raw/20210325.parquet'.split(),
  files=['my_etl.py'],
  ...)
coiled.start_job('etl-20210325')

If there was a way to send either the command or the arguments, you could reuse your job configuration:

coiled.start_job('etl', command='python my_etl.py --path s3://mybucket/raw/20210325.parquet'.split())  # full command
coiled.start_job('etl', args='--path s3://mybucket/raw/20210325.parquet'.split())  # just the arguments
shughes-uk commented 1 year ago

jobs is deprecated