Right now AZTK in Spark SDK when aztk.spark.client.Client.submit() is called,
it assumes that ApplicationConfiguration contains paths to local files in jars and files fields.
In our case we already have the spark job resources uploaded to Azure Blob Storage so we want to avoid downloading and uploading them again.
From what I see, aztk.spark.client.Client.submit() calls generate_task which uploads files to blob storage, generates ResourceFiles for them, replaces local paths with file names in application config and uploads it as application.yml file to blob storage.
I would like to have an option to provide resource_files directly to Client.submit() and thus skip uploading files.
Right now we use a workaround where we basically reimplement generate_task and generate resource_files for our blobs ourselves. This seems brittle as it is coupled to AZTK SDK implementation and can break when AZTK changes in future.
I think this is a great feature. We should support both scenarios - local upload and referencing existing files in storage. Thanks for the feature request!
Hello @jafreck @timotheeguerin
Right now AZTK in Spark SDK when
aztk.spark.client.Client.submit()
is called, it assumes thatApplicationConfiguration
contains paths to local files injars
andfiles
fields.In our case we already have the spark job resources uploaded to Azure Blob Storage so we want to avoid downloading and uploading them again.
From what I see,
aztk.spark.client.Client.submit()
callsgenerate_task
which uploads files to blob storage, generatesResourceFile
s for them, replaces local paths with file names in application config and uploads it asapplication.yml
file to blob storage.I would like to have an option to provide
resource_files
directly toClient.submit()
and thus skip uploading files.Right now we use a workaround where we basically reimplement
generate_task
and generateresource_files
for our blobs ourselves. This seems brittle as it is coupled to AZTK SDK implementation and can break when AZTK changes in future.