Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
753 stars 243 forks source link

Error with Mobilenet V1: cycle-free graph violated: partition depends on itself #1199

Closed senekor closed 2 months ago

senekor commented 2 months ago

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

Quick summary

I'm trying to compile Mobilenet V1 with FINN and I'm getting the error message "cycle-free graph violated: partition depends on itself". This looks like an internal error, so I'm not sure what I can do to fix it.

Steps to Reproduce

Add what needs to be done to reproduce the bug. Add code examples where useful and make sure to include the resulting ONNX files, and the commit hash you are working on.

  1. Clone the FINN repository
  2. Checkout the dev branch, with commit hash: 7076ed3f
  3. Start the docker container with the command: ./run-docker.sh notebook
  4. Attempt to build dataflow with the following config:
    cfg = build.DataflowBuildConfig(
        output_dir          = output_dir,
        mvau_wwidth_max     = 80,
        target_fps          = 1,
        synth_clk_period_ns = 10.0,
        fpga_part           = "xc7z020clg400-1",
        steps               = build_cfg.hw_codegen_dataflow_steps + [
            "step_hw_ipgen",
            "step_set_fifo_depths",
            "step_create_stitched_ip",
        ],
        generate_outputs=[
            build_cfg.DataflowOutputType.STITCHED_IP,
        ]
    )
    build.build_dataflow_cfg(model_file, cfg)

    It should fail during step_create_dataflow_partition.

Expected behavior

No error messages, stiched IP is created successfully.

Actual behavior

Error message during step_create_dataflow_partition:

AssertionError: cycle-free graph violated: partition depends on itself

Running step: step_create_dataflow_partition [5/14]
> /home/senk/repos/finn/deps/qonnx/src/qonnx/transformation/create_generic_partitions.py(119)apply()
    117                     if node is not None:
    118                         assert (
--> 119                             self.partitioning(node) != partition_id
    120                         ), """cycle-free graph violated: partition depends on itself"""
    121                         # print(node)

ONNX files

(renamed to .txt because GitHub rejected .onnx)

This was produced by taking the pretrained model from brevitas and exporting it as ONNX.

from brevitas_examples.imagenet_classification.models import quant_mobilenet_v1_4b
model = quant_mobilenet_v1_4b()

mobilnet_v1.txt

I can't upload the intermediate model produced in step_convert_to_hx, because it's too big at 117 MB.

auphelia commented 2 months ago

The error message cycle-free graph violated: partition depends on itself indicates that not all layers in the network were converted to HW layers. From the code you've shared, it looks like you are using the standard builder steps. For MobileNet-v1, there are additional custom steps required. You can find an example build flow in finn-examples: https://github.com/Xilinx/finn-examples/tree/main/build/mobilenet-v1

Or if you would like to try a manual approach, you can have a look our end2end test for MobileNet-v1: https://github.com/Xilinx/finn/blob/dev/tests/end2end/test_end2end_mobilenet_v1.py

To debug an error like this, you can open the step_convert_to_hw_layers.onnx and check if all nodes are converted to fpgadataflow nodes (which is indicated by the domain attribute). And see if you can apply additional transformations from the library to convert all layers.

senekor commented 2 months ago

Thanks a lot! I was able to make it work with the finn-examples repo as reference.