Open krzysz00 opened 3 weeks ago
Attention: Patch coverage is 83.67347%
with 32 lines
in your changes missing coverage. Please review.
Project coverage is 78.00%. Comparing base (
9ef8698
) to head (0435394
). Report is 4 commits behind head on develop.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Apologies for the mega-commit. (this one also includes a cherry-pick from upstream)
Previously, if you input-fused an elementwise funciton like (x) => (x + 4), this would cause issues, because the 4 would be added both to elements of the input value but also to padding values introduces during tiling, which would cause incorrect results.
To fix this, we:
That is, we make the threadwise_read_into ops that read each input to the generic record their validities, and then do a register -> register threadwise_read_into (which would ordinarily be a memcpy()) with dynamic validities taken from those reads to re-apply the 0 mask after the generic runs.
Note: While I was here, I noticed that non-xdlops gemms were producing two sets of theradwise_read_into ops, so I fixed that, which broke a few tests