Implement a separate-and-fuse parallel level

finch-tensor / Finch.jl

Sparse and Structured Tensor Compiler

http://willowahrens.io/Finch.jl/

MIT License

161 stars 16 forks source link

Implement a separate-and-fuse parallel level #622

Open willow-ahrens opened 3 hours ago

willow-ahrens commented 3 hours ago

The idea would be to store P separate copies of the output (one per processor), and reduce them all at the end. This requires changing "update" mode to include the reduction operator.

willow-ahrens commented 3 hours ago

depends on #608