check vectorization factor of shared memory consumers to avoid illegal vectorization size

liqiangxl commented 4 weeks ago

Issue InnerOuter persistent scheduler uses shared memory to store persistent buffers, the data flow is input in gmem ---> async copy to smem --> vectorized load to registers (smem consumers), the --> are simply LoadStoreOp and same vectorization factors of these two copies are used. CI found a case where the shared memory persistent buffers have a data type of fp32 while the inputs are fp16 (when there are view ops, project to inputs is not used). The vectorization factor is set to 8 and caused 32 bytes vectorization when loading from shared memory to registers.

Changes: (1) Added code to handle the vectorization of smem consumers. Add an additional split if smem --> regs copy leads to vectorization larger than 16 bytes. (2) Added a test

Results: Ensure vectorizations are <= 16 bytes.

Following works See issue https://github.com/NVIDIA/Fuser/issues/3272

liqiangxl commented 4 weeks ago

!build

liqiangxl commented 4 weeks ago

!build