Open kevinstephano opened 1 year ago
This is a good test case. I think I know where the heuristic fails. This is probably related to https://github.com/csarofeen/pytorch/pull/2455
Despite having two reshapes, the second case produces 1 kernel with either float or half inputs. I'm not sure how that is happening since there are two reshapes, so it matches the "comment out C" pattern from https://github.com/csarofeen/pytorch/issues/2090#issuecomment-1398665847.
In the first case the segmenter is refusing to merge across the three connected components. I'm not sure this is due to reshapes: I don't think this is ever done: see this comment: https://github.com/csarofeen/pytorch/blob/devel/third_party/nvfuser/csrc/fusion_segmenter.cpp#L3275-L3279 For this particular case since the three groups are independent, wouldn't three kernels actually be preferable?
🐛 Describe the bug
I have a horizontal fusion situation with
reshape
that I would like to understand if this can be fused. I think we have a knob to turn this on or a place to switch this. Jie might know. It would be good if this could be 1 kernel.Second case looks okay. Could you just double check it is okay with FP16 inputs?
Versions
TOT