NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

[WIP] (Yet another) indexing war for resize #3454

Open naoyam opened 1 day ago

naoyam commented 1 day ago

This is a WAR for #3455. The exact graph-based indexing doesn't work because of the mapping introduced by the residual path. I think we should investigate what the right graph should look like for indexing, but to unblock the scheduler for RoPE, this PR tries to work around the issue by creating a local graph that only includes the tensors involved in the expression to index, thus removing the effect by the residual path.

IndexngTraversal::getExprsBetweenForResize is the main addition, which creates a new IdModel just consisting of the tensors of a given expr. If a resize is used in any of the producers and consumers of the expr, we use the path found by the local model. Currently, it it fails to find a path, it's considered an error.

While this WAR works for the prototype scheduler for RoPE so far (#3425), it does have some issues as well. For example, since the local IdModel doesn't have all the information necessary to identify loop promotions, but the loop domain of the expr may be promoted, so it may not be able to find the corresponding IDs within the local model. In other words, if resize is used with inlined broadcast IDs, getExprsBetweenForResize may fail to find a path, which would then fall back to the existing path, which may not be correct in the case of #3455. However, this can be avoided by scheduling the loop domains such that no promotion analysis is required. We can now do this by using things like TensorDomain::broadcast() and scheduler_tools::scheduleLoopDomainsLike(), so I don't think this issue is a blocker.

The overall changes are also due to the change of the interface of IndexingTraversal::getExprsBetween, which now requires std::vector<IterDomain*> instead of ValGroups since for the local IdModel, the former is required.

naoyam commented 1 day ago

!test

naoyam commented 23 hours ago

!test --diff

naoyam commented 21 hours ago

!test

naoyam commented 4 hours ago

@jacobhinkle @zasdfgbnm While the H100 tests are still blocked, could you please start reviewing this PR? I'll also do the codegen diff tests once the H100 tests are completed.