csarofeen / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
26 stars 7 forks source link

Mega issue tracking `torchbenchPerf` on benchmark runs #2065

Open jjsjann123 opened 2 years ago

jjsjann123 commented 2 years ago

🐛 Describe the bug

Just a common place tracking all issues that we run into with our benchmark.

Functional Issues

New bugs round 2 with repro under code base: https://github.com/csarofeen/pytorch/issues/2065#issuecomment-1279422075

Needs more triage to determine who should fix:

Performance Issues

Versions

torchbenchPerf branch

commit 94d159f5f1306fbfff4ae43ccd196a0675317889 (csarofeen/torchbenchPerf)
Author: jjsjann123 <jiej@nvidia.com>
Date:   Wed Oct 12 17:30:17 2022 -0700

    patching dfs by excluding dfs within partition

Note that all issues are repro'ed in the commit above. We might be cherry-picking more commits with fixes, any new issue linked reported here should include their commit if it differs.

jjsjann123 commented 2 years ago

Current mega container we are using is: gitlab-master.nvidia.com/kstephano/pytorch:torchbenchPerf_cg_dynamo_3

Note that: we'd want to cherry-pick two more PRs for perf reason: transpose changes: https://github.com/pytorch/pytorch/pull/86967 silu_backward changes: https://github.com/pytorch/pytorch/commit/fc3afc840784106b173c87c95b1ee96a4018bb3d

Which has Ivan's torchdynamo:

commit 1e72e26d29d7f07f13befa134785d494b6c7e830 (HEAD, test/nvfuser-cudagraphify)
Author: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date:   Wed Oct 12 16:29:37 2022 +0300

    Use align_inputs from inductor

With the conv/bias decomposition there but not checked in. with a code diff, it's effectively

commit 4f4ffba4a227b332a19121eeffc0b8b490e6d22c (jiej/wip)
Merge: 5491aa4d 1e72e26d
Author: jjsjann123 <alex.jann2012@gmail.com>
Date:   Thu Oct 13 16:41:59 2022 -0700

    Merge remote-tracking branch 'ivan/nvfuser-cudagraphify' into conv2d_decomp
jjsjann123 commented 2 years ago

I have Ivan's PR cherry-picked. This is the new branch head

commit b3f4b3032deceabefbef681e56d99b5a6daa9da4 (csarofeen/torchbenchPerf)
Author: jjsjann123 <jiej@nvidia.com>
Date:   Fri Oct 14 15:04:15 2022 -0700

    cherry-picking PR https://github.com/pytorch/pytorch/pull/86967
commit 5863911099541f2a16d4699d6815850dae693cb1
(ivan/nvprims-transpose-partitioner)
Merge: 73669a71003 fc3afc84078
Author: Ivan Yashchuk <IvanYashchuk@users.noreply.github.com>
Date:   Fri Oct 14 22:58:55 2022 +0300

    Merge branch 'master' into nvprims-transpose-partitioner
```
jjsjann123 commented 2 years ago

Note that some side issues that we've been discussing, would want to track these with upstream folks so we'll have better debuggability in future perf analysis:

  1. aot_autograd traced graph differs between runs. -> pain for debugging, especially given that our fusion_id also doesn't align in those.
  2. partitioner uses Set, which could give us fusion partition that could ended up in wrong order. I tend to think this one is safe, since partitioner is only supposed to be queried if a given node is within a partition, but this is implementation specific, I should double check it before coming to a conclusion. Alternatively, it should be easy to switch it to a Dict, which is deterministic.
jjsjann123 commented 2 years ago

Upstream has switched backends for benchmark runs: https://github.com/pytorch/pytorch/pull/88437

linking it here to track our progress of upstreaming fixes.

jjsjann123 commented 2 years ago

Note for myself. convolution(_backward) bias decomp seems to be not helping with our benchmark https://github.com/pytorch/torchdynamo/pull/1645 <- so we won't be pushing this to upstream.