hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
There are three tracks for the sparse MM.
so, needed to make sure the pack items that are required will be poped in each iter.
[idea]
Used 3 different pack pools to store the pack instructions of A, B, and Metadata
Step 1, only put the required pack into the code (the number of required packs may differ for each mfma iteration).
check if the inserted pack is forfulled the instPerPack, if not insert next pack instructions until statisfied.
Step 2, if there still have room before mfma, then insert next pack instructions (the # of instruction that going to be inserted will same as #instPerPack)
Step 3, put another pack or SNop before the mfma instruction according to the needed latency. the combination of insertion may be 2 packs, 1 pack + snop 0, or snop 1.
There are three tracks for the sparse MM. so, needed to make sure the pack items that are required will be poped in each iter.
[idea]