The LocalSession does not take a default_bucket argument and thus the bucket used for the upload can only be changed by modifying the private attribute of the session after creation, which is hacky to say the least. It is also not documented anywhere, as far as I can tell.
It is not ideal to upload all local inputs to s3 just to download them to the container again in local mode. Local mode should be truly local if inputs/outputs are local paths.
Describe alternatives you've considered
Use s3 paths for inputs and outputs but this slows down local mode since the SDK will download/upload those inputs/outputs each time. Local mode should enable quick local testing independent of cloud resources like s3 buckets.
Describe the feature you'd like Keep local inputs and outputs local in processing jobs when using local mode.
How would this feature be used? Please describe. Currently, all local inputs are uploaded to the default bucket specified on the Sagemaker session. There is no differentiation between local mode and non-local mode. https://github.com/aws/sagemaker-python-sdk/blob/22ba84e711cfd688b836b0a092020cf02aa08b97/src/sagemaker/processing.py#L300-L318
This has two issues:
LocalSession
does not take adefault_bucket
argument and thus the bucket used for the upload can only be changed by modifying the private attribute of the session after creation, which is hacky to say the least. It is also not documented anywhere, as far as I can tell.Describe alternatives you've considered