Open mfbalin opened 2 months ago
To trigger regression tests:
@dgl-bot run [instance-type] [which tests] [compare-with-branch]
;
For example: @dgl-bot run g4dn.4xlarge all dmlc/master
or @dgl-bot run c5.9xlarge kernel,api dmlc/master
@frozenbugs FYI.
Covering different version combination of torch, cuda is not necessary for CI. CI is just cover the basic unit tests. Instead, our regression or release pipeline should cover (Unfortunately it's not covered yet).
Covering different version combination of torch, cuda is not necessary for CI. CI is just cover the basic unit tests. Instead, our regression or release pipeline should cover (Unfortunately it's not covered yet).
Then shall we remove the newly added pytorch version related things and merge the refactored remaining portion? There was quite a lot of duplication in some files. Maybe you want to personally test before merging so it is okay to keep this PR open until we can actually test it.
Description
We support torch 2.1.0 as our minimum. The latest torch version we support is 2.3.1. The least we can do is to run tests for both versions in the CI to be more robust. @Rhett-Ying could you help achieve that? This PR is simply building our container to include 2 environments for pytorch,
pytorch-ci
for the latest torch we support andpytorch-min-ci
for the oldest torch we support. Feel free to push to this branch if it is not mergeable like this, I opened this PR to kickstart the effort and help you.Motivation: prevent mistakes like #7521 from occurring again.
I think it would be best if we modified the CI related files so that we can easily run the tests for any number of torch versions. The ideal case is to test all the pytorch versions that we claim to support in the CI.
Checklist
Please feel free to remove inapplicable items for your PR.
Changes