getindata / kedro-sagemaker

Kedro Plugin to support running pipelines on AWS SageMaker.
https://kedro-sagemaker.readthedocs.io
Apache License 2.0
18 stars 6 forks source link

Propose to update the defaults in the template configuration #13

Closed noklam closed 1 year ago

noklam commented 1 year ago

Hi, thank you for creating this plugin. I have tried to play around with it and successfully run my pipeline on SageMaker.

Context

This is mainly from the perspective of someone who just wants to try out the plugin and see how it works. I try to document the challenges that I encountered and I hope this will smooth out the onboarding process a bit.

Proposed Changes

Thanks for creating the plugin, the video is also quite useful as it provides extra instructions that aren't covered in the documentation.

marrrcin commented 1 year ago

Thanks @noklam for your suggestions! I'm really happy that the plugin worked for you 🙂

The changes you've suggested will definitely improve the end user experience.

Populate a link to the corresponding experiment automatically - I am not sure if is this possible, but when I first run the pipeline I struggle to see my pipeline until later I find out I need to create a profile and check my pipeline in Studio. It would be nice if there is a clickable link in the terminal.

Do you mean something like displaying the URL for the started pipeline in the terminal (like we do for Azure ML for example)? I was looking for a way to show it somehow, but I was not able to find an easy way to do this yet - it might required some more research + I'm not sure whether SageMaker supports deep linking at all 🤔

noklam commented 1 year ago

@marrrcin Yeah, I am thinking of a link that will lead users to the experiment page automatically. It's great that you have already considered that, I understand it could be tricky to figure out how to implement it.

marrrcin commented 1 year ago

@noklam , please also modify the unit tests:

  _____________ test_should_generate_pipeline_with_processing_steps ______________

  context_mock = <MagicMock name='KedroContext' id='140018508686096'>
  no_mlflow = None

      @patch("kedro.framework.project.pipelines", {"__default__": sample_pipeline})
      @patch("kedro.framework.context.KedroContext")
      def test_should_generate_pipeline_with_processing_steps(context_mock, no_mlflow):
          # given
          config = _CONFIG_TEMPLATE.copy(deep=True)
          generator = KedroSageMakerGenerator(
              "__default__", context_mock, config, is_local=False
          )

          # when
          pipeline = generator.generate()

          # then
          assert isinstance(pipeline.sagemaker_session, pipeline_context.PipelineSession)
  >       assert pipeline.name == "kedro-sagemaker-pipeline"  # a default one
  E       AssertionError: assert 'my-pipeline' == 'kedro-sagemaker-pipeline'
  E         - kedro-sagemaker-pipeline
  E         + my-pipeline
noklam commented 1 year ago

@marrrcin How do I run the test here? The CI doesn't seem to run correctly. I have some trouble setting the environment up when I do pip install . It has trouble finding the correct version of boto3

marrrcin commented 1 year ago

We're using Poetry in this repo, the setup is described here: https://kedro-sagemaker.readthedocs.io/en/latest/source/04_development.html

noklam commented 1 year ago

the test looks fine now, sonar is failing with missing token which I can't fix. @marrrcin