aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

SageMaker Bring Your Own Container on local mode - ProcessingOutput is not linked to local filesystem #3083

Open idanmoradarthas opened 2 years ago

idanmoradarthas commented 2 years ago

Describe the feature you'd like During the work with SageMaker BYOC on local mode (with Python SDK), we encountered the situation where the outputs of the container are staged into the SageMaker default artifact bucket. Then the SDK does not download those artifacts into the local file system.

How would this feature be used? Please describe. We want that SDK will download the artifacts created automatically into the local file system.

Describe alternatives you've considered We had to create a mechanism to download those files by ourselves: Lack_of_impl_on_S3_outputs_download In the following snippet of code, we used https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/scikit_learn_bring_your_own_container_local_processing/scikit_learn_bring_your_own_container_local_processing.py as a reference (also used the output_config dictionary)

shreyapandit commented 2 years ago

Hi @idanmoradarthas

Thank you for your feedback! We will bring this to the team and will work on discussing and prioritizing this enhancement as part of our roadmap.

Regards, Shreya

idanmoradarthas commented 2 years ago

Hi @shreyapandit Thank you so much so the response.

I do want to emphasize that Outseer, my company, will benefit very much from a local mode that is completely local, without internet connection, As we use the local mode for our testing suite. You did solved for us the initial stage in issue #3084, I know this issue is about the auto-download, but in your solution we would very much appreciate if after the container had finish its run the output will be copied to the local output of the PC without reaching S3 or opening an internet connection.

dsradecki commented 1 year ago

Hi @shreyapandit. Would you be able to share any insights on when this could be resolved? Specifically, I'm speaking about the fact that sagemaker.processing.Processor seems to be completely ignoring a session initialised like:

sagemaker_session = LocalSession()
sagemaker_session.config = {'local': {'local_code': True}}

and this is because it still requires default_bucket to be specified while sagemaker.estimator.Estimator works without it.

clausagerskov commented 1 year ago

the core of this issue seems to be the default_bucket definition in the local session, even though it is specified when creating the session, sagemaker sdk still does the whole _create_s3_bucket_if_it_does_not_exist, which requires internet and credentials set up, which blocks solutions such as localstack for mocking s3

clausagerskov commented 1 year ago

this is marked as fixed here but is not actually fixed https://github.com/aws/sagemaker-python-sdk/issues/3084

clausagerskov commented 1 year ago

@shreyapandit