KULeuven-MICAS / zigzag

HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators
https://kuleuven-micas.github.io/zigzag/
MIT License
104 stars 36 forks source link

Error when running spatial mapping search with Edge TPU example: The MAC level unit count is not the same for all operand #34

Closed chensy7 closed 10 months ago

chensy7 commented 10 months ago

Hello Zigzag team,

I'm running spatial mapping search on the Edge TPU example using Zigzag, and I'm getting the following error:

2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node 
2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node 
2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node 
2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node 
2023-11-08 13:38:22,271 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node 
2023-11-08 13:38:22,271 - generate_layer_node_for_gemm +143 - INFO - Parsed Gemm node 
2023-11-08 13:38:22,271 - generate_layer_node_for_gemm +143 - INFO - Parsed Gemm node 
2023-11-08 13:38:22,271 - generate_layer_node_for_gemm +143 - INFO - Parsed Gemm node 
2023-11-08 13:38:22,271 - parse_workload_from_onnx_model_and_mapping +111 - INFO - Created ONNXWorkload graph with 24 nodes and 23 edges.
2023-11-08 13:38:22,272 - parse_accelerator_from_path +52 - INFO - Parsed accelerator with cores [1].
2023-11-08 13:38:22,272 - run +29 - INFO - Processing layer 0...
2023-11-08 13:38:22,272 - run +97 - INFO - User-provided spatial mappings or hints not found. Auto-generating spatial_mapping_hint..
2023-11-08 13:38:22,272 - run +132 - INFO - Launching spatial mapping 1/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OX', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:22,272 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 859.45it/s]
2023-11-08 13:38:23,111 - run +132 - INFO - Launching spatial mapping 2/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OX', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:23,112 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 848.91it/s]
2023-11-08 13:38:23,960 - run +132 - INFO - Launching spatial mapping 3/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OY', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:23,960 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 859.84it/s]
2023-11-08 13:38:24,798 - run +132 - INFO - Launching spatial mapping 4/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OY', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:24,798 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 873.07it/s]
2023-11-08 13:38:25,623 - run +132 - INFO - Launching spatial mapping 5/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OX', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:25,623 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 866.16it/s]
2023-11-08 13:38:26,455 - run +132 - INFO - Launching spatial mapping 6/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OX', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:26,455 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 857.06it/s]
2023-11-08 13:38:27,296 - run +132 - INFO - Launching spatial mapping 7/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OY', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:27,296 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 844.54it/s]
2023-11-08 13:38:28,149 - run +132 - INFO - Launching spatial mapping 8/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OY', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:28,149 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 867.52it/s]
2023-11-08 13:38:28,979 - run +132 - INFO - Launching spatial mapping 9/16: {'D1': ('OX', 8), 'D2': ('FX', 8), 'D3': ('OX', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:28,980 - run +72 - INFO - Running temporal mapping search engine...
  0%|                                                   | 0/720 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/kmdl/test/zigzag/run.py", line 15, in <module>
    energy, latency, cme = get_hardware_performance_zigzag(workload=workload,
  File "/home/kmdl/test/zigzag/zigzag/api.py", line 73, in get_hardware_performance_zigzag
    answers = mainstage.run()
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/Stage.py", line 49, in run
    for cme, extra_info in self.list_of_callables[0](
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/ONNXModelParserStage.py", line 28, in run
    for cme, extra_info in sub_stage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/MainInputParserStages.py", line 21, in run
    for cme, extra_info in sub_stage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/SaveStage.py", line 88, in run
    for id, (cme, extra_info) in enumerate(substage.run()):
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/SaveStage.py", line 137, in run
    for id, (cme, extra_info) in enumerate(substage.run()):
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/ReduceStages.py", line 122, in run
    for cme, extra_info in substage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/WorkloadStage.py", line 31, in run
    for cme, extra_info in sub_stage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/SaveStage.py", line 32, in run
    for id, (cme, extra_info) in enumerate(substage.run()):
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/ReduceStages.py", line 97, in run
    for cme, extra_info in substage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/SpatialMappingGeneratorStage.py", line 145, in run
    for cme, extra_info in spatial_mapping_conversion_stage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/SpatialMappingConversionStage.py", line 82, in run
    for cme, extra_info in sub_stage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/ReduceStages.py", line 97, in run
    for cme, extra_info in substage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/LomaStage.py", line 52, in run
    for cme, extra_info in sub_stage.run():
  File "/home/kmdl/test/zigzag/zigzag/classes/stages/CostModelStage.py", line 54, in run
    self.cme = CostModelEvaluation(
  File "/home/kmdl/test/zigzag/zigzag/classes/cost_model/cost_model.py", line 261, in __init__
    self.mapping_int = Mapping(
  File "/home/kmdl/test/zigzag/zigzag/classes/mapping/combined_mapping.py", line 212, in __init__
    self.spatial_mapping = SpatialMapping(spatial_mapping, layer_node)
  File "/home/kmdl/test/zigzag/zigzag/classes/mapping/spatial/spatial_mapping.py", line 30, in __init__
    self.calc_unit_count()
  File "/home/kmdl/test/zigzag/zigzag/classes/mapping/spatial/spatial_mapping.py", line 122, in calc_unit_count
    assert all(
AssertionError: The MAC level unit count is not the same for all operand [690, 720, 690], please correct the spatial mapping.

Here is the code to reproduce this run:

from zigzag.api import get_hardware_performance_zigzag

opt = 'EDP'
model = "alexnet"
onnx_model_path = f"zigzag/inputs/examples/workload/{model}.onnx"
workload = onnx_model_path

hwarch = "Edge_TPU_like"
mapping = f"zigzag.inputs.examples.mapping.default"
accelerator = f"zigzag.inputs.examples.hardware.{hwarch}"

dump_filename_pattern=f"outputs/{hwarch}-{model}-layer_?.json"
pickle_filename = f"outputs/{hwarch}-{model}-saved_list_of_cmes.pickle"

energy, latency, cme = get_hardware_performance_zigzag(workload=workload,
                                                       accelerator=accelerator,
                                                       mapping=mapping,
                                                       opt=opt,
                                                       dump_filename_pattern=dump_filename_pattern,
                                                       pickle_filename=pickle_filename)
print(f"Total network energy = {energy:.2e} pJ")
print(f"Total network latency = {latency:.2e} cycles")
print(f"Total edp = {energy*latency:.2e} pJ*cycles")

and the mapping file I used is simply:

mapping = {
    "default": {
        "core_allocation": 1,
        "memory_operand_links": {"O": "O", "W": "I2", "I": "I1"},
    },
    "Add": {
        "core_allocation": 1,
        "memory_operand_links": {"O": "O", "X": "I2", "Y": "I1"},
    },
    "Pooling": {
        "core_allocation": 1,
        "memory_operand_links": {"O": "O", "W": "I2", "I": "I1"},
    },
}

This seems to happen for all examples with 4 levels of MACs (Edge TPU, Tesla NPU, Meta and Ascend) running Alexnet/Resnet18/MBNetv2, but not for the TPU example with 2 levels. Also FWIW, this happened after I pulled from the Zigzag repo yesterday. I don't get this error with a local copy of Zigzag timestamped Sep 25th.

Am I doing something wrong? Any help is appreciated. Thanks! Siyuan

JiacongSun commented 10 months ago

Hi Siyuan,

Thank you for bringing up this issue.

We've identified that the problem arises when a layer dimension is mapped to multiple hardware dimensions. I'm pleased to inform you that we have addressed and resolved this issue in the latest update.

In terms of the mapping file, I think in the ZigZag version released in September, the spatial_mapping dictionary must be provided (you can refer to inputs/mapping/default.py for an example). However, in the current version, this requirement has been relaxed, and the spatial mapping will be automatically generated if omitted. In such cases, it is advisable to provide the spatial_mapping_hint dictionary instead (you can refer to tests/main/test_with_mix_spatial_mapping/test_tesla_npu_like.py for an example). Omitting both dictionaries will result in a fully flexible spatial mapping searching space, which may not accurately represent a real hardware system.

Please rerun your script with the latest ZigZag version and let us know if you encounter any further issues.

Best regards, Jiacong

chensy7 commented 10 months ago

Hi Jiacong,

Thanks for your reply and for taking a look at the issue. I can verify that the bug has been fixed on my end too. I'm closing the issue.

Siyuan