Open zheweijushi opened 2 months ago
According to Issue #67 , it can be inferred that when B=1, approximately 119 GFLOPs are needed (in fact, this represents the number of MACs).
However, in the DiT code, when calculating attention, it seems that an additional B empty classes are added to the calculation.
Therefore, when estimating the computational load of the DiT block, it should be calculated as B*2.
Generating a 256x256 image should require 1 (B=1) 2 (adding empty classes) 119 GFLOPs ?
Is there a problem with my understanding? I hope you can answer this question. Thank you very much
@wpeebles @s9xie @ictzyqq @void-main could you please kindly take a look, thank you very much!
According to Issue #67 , it can be inferred that when B=1, approximately 119 GFLOPs are needed (in fact, this represents the number of MACs).
However, in the DiT code, when calculating attention, it seems that an additional B empty classes are added to the calculation.
Therefore, when estimating the computational load of the DiT block, it should be calculated as B*2.
Generating a 256x256 image should require 1 (B=1) 2 (adding empty classes) 119 GFLOPs ?
Is there a problem with my understanding? I hope you can answer this question. Thank you very much