[Performance Issue]: The New Update of Oneflow is not optimising enough for RTX 3090, 4090

Brief Description

Hello, We had created the python environment which was going good inference speed for stable diffusion using oneflow diffusers ( 30 it/sec ) on my 3090. The installed oneflow version was oneflow-0.9.1.dev20230212+cu117. And I was using oneflow-fork as diffusers branch. With the new update of oneflow the speed has considerably gone down on 3090 and 4090 but it is same on A100 GPU. I am not able to find oneflow-0.9.1.dev20230212+cu117 version anywhere to replicate the same speed on my local 3090 machine. So my question is why oneflow new version and onediff are not giving speed up on 4090 and 3090 machines. Thank you very much

Device and Context

The benchmarking has been done on my local 3090 machine and on cloud as well.

Benchmark

Previously it was giving around ( 30 iteration/sec ) on 512x512 resolution of stable diffusion 1.5 model on 3090 machine. Now it is giving ( 9-10 iteration / sec ) as same as simple diffusers give.

Alternatives

No response

Oneflow-Inc / oneflow