cumulo-autumn / StreamDiffusion

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Apache License 2.0
9.48k stars 677 forks source link

Different FPS with same model and parameters. #154

Open olegchomp opened 3 months ago

olegchomp commented 3 months ago

Difference is around 5 FPS, tested with sd_turbo & 1 batch size. On inspect trt engine found that faster engine have different info inside. Also there is difference size slower engine is 2053166 kb, faster engine is 2079951 kb, and vae engine is 100 kb difference also. Result was achieved only once, since that all engines generate slower. Tried even with clean venv.

slower engine:

"Layers": [{
  "Name": "/conv_in/Cast",
  "LayerType": "NoOp",
  "Inputs": [
  {
    "Name": "sample",
    "Location": "Device",
    "Dimensions": [1,4,64,64],
    "Format/Datatype": "Row major linear FP32"
  }],
 "Outputs": [
  {
    "Name": "/conv_in/Cast_output_0",
    "Location": "Device",
    "Dimensions": [1,4,64,64],
    "Format/Datatype": "Row major linear FP32"
  }],
  "TacticValue": "0x0000000000000000",
  "StreamId": 0,
  "Metadata": ""
}

faster engine:

{"Layers": [{
  "Name": "/conv_in/Cast",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "sample",
    "Location": "Device",
    "Dimensions": [1,4,64,64],
    "Format/Datatype": "Row major linear FP32"
  }],  "Outputs": [
  {
    "Name": "/conv_in/Cast_output_0",
    "Location": "Device",
    "Dimensions": [1,4,64,64],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "CAST",
  "TacticValue": "0x00000000000003e8",
  "StreamId": 0,
  "Metadata": ""
}