aws / sagemaker-distribution

A set of Docker images that include popular frameworks for machine learning, data science and visualization.
Apache License 2.0
89 stars 50 forks source link

Add `*.env.out` to `.gitignore` #96

Closed dlqqq closed 9 months ago

dlqqq commented 11 months ago

To my knowledge, *.env.out files can change on rebuild even if no code changes were made.

With this, I propose that our next steps are to:

  1. Verify that *.env.out files can change on rebuild even when no new packages are added to *.env.in, and verify that they're not being used for any purpose other than to track what packages went into a release
  2. Add *.env.out to .gitignore
  3. Add a GitHub rule enforcing that new PRs are rebased onto the latest commit of the target branch, as merge conflicts will no longer stop contributors from merging PR branches atop an old commit.
dylanraws commented 11 months ago

The *.env.out files allow the environment to be reproduced without using docker, as stated in the README.

Amazon SageMaker Distribution supports full reproducibility of Conda environments, so you don't necessarily need to use Docker. Just find the version number you want to use in the build_artifacts directory, open one of cpu.env.out or gpu.env.out and follow the instructions in the first 2 lines.

SageMaker Studio Lab currently consumes those files for this purpose. Other SageMaker services may follow this route, such as Notebook Instances since its environment is not container based.

dlqqq commented 10 months ago

@dylanraws Thanks! This explains the current precedent of only updating the env.out files on release. Then this issue is still valid, since we only want to commit changes manually & explicitly immediately prior to release.

ketan-vijayvargiya commented 9 months ago

Verify that *.env.out files can change on rebuild even when no new packages are added to *.env.in, and verify that they're not being used for any purpose other than to track what packages went into a release

This is true.

IMO the *.env.out files should be generated once as the last step of the image build process. Now, if they are generated before or not - that shouldn't be a problem as long as they are reliably overridden at the end.

So, I am not sure why we need such a change at all.

An alternative could be we stop generating env.out files by default. Plus, we add a new parameter to the build command, that explicitly generates this file, and only the final build step is the one that leverages this parameter.

balajisankar15 commented 9 months ago

env.out doesn't change if you re-build. It might change only if you use the --force flag.

https://github.com/aws/sagemaker-distribution/pull/57.

When we do our releases , we don't use the --force flag by default.