aws-samples / sagemaker-101-workshop

Hands-on demonstrations for data scientists exploring Amazon SageMaker
77 stars 48 forks source link

Upgrade to TensorFlow v2. #14

Closed tom5610 closed 2 years ago

tom5610 commented 3 years ago

Issue #, if available: #2

Description of changes: This is to upgrade TensorFlow Keras NLP notebook to V2. Also, I've updated the model training instance type to 'ml.m5.xlarge' to avoid gpu instance contention and add a comment to the code snippet in regard word embedding file download.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

athewsey commented 3 years ago

Thanks for making a start on this @tom5610!

A couple of action items:

  1. We should really upgrade the MNIST challenge at the same time as NLP, because typically participants are referring to the NLP example to help them port the MNIST code.
    • This will normally mean 2 PRs: One to main branch updating the local notebook (and the instructions if needed); and one to solution updating the reference solution notebook. By this repo's branching strategy it's okay to merge main->solution but not the other way around.
    • If this looks like it'd be a big code change, or if you don't have time to spend on it at the moment, then let's coordinate with @rosalieandico because I think she's planning some more updates to MNIST soon
  2. We do need to update the kernel selection notes in the notebook and any other affected docs/comments (this is not big so I can pick it up as part of merging if needed, but if you're already doing updates and can sweep it up that'd be helpful)
  3. I'll just need to double-check with some of our team that we're ready to move forward with this change
    • As I understand it, the proposed change would break the workshop on SageMaker Notebook Instances
      • NBIs currently offer TFv2 kernels out-of-the-box only when created with the notebook-al2-v1 "platform ID"
      • AL1 is still the default option today (this blog mentions an original target to change this default by April 2022)
      • CloudFormation doesn't currently offer a platform ID parameter on the NotebookInstance resource
      • In a workshop setting, our NBIs are typically created ready for users by CloudFormation
    • Although we use Studio by default, we understand NBIs are still important to many customers so it's a good option to have
    • @tom5610 Are we lucky enough that the new model.save method works fine on TFv1.15 as well? If so, we might be fine already because users could just run the same code on TFv1 kernels if needed!
    • If not, maybe we can proceed anyway on the basis that broken NBI flow should be a temporary issue - fixable either when we have a CloudFormation option or the default changes to AL2, whichever comes first.
athewsey commented 3 years ago

Update: Seems like PlatformIdentifier parameter is now supported on CloudFormation NotebookInstance resource, which unblocks my point 3 above! 🎉