This feature allows the output data of a layer to remain in the low-level memory when possible. This enables direct use of the output data by the next layer, eliminating the need for data to travel to/from the top memory level.
Note:
This feature applies only to layers on the same branch.
By default, it is assumed that the initial input and final output of the entire network can be generated from the low-level memory. If your case requires them to travel to/from the top memory level, you can change this assumption by setting the workload_data_always_from_top_mem parameter to True in the run() function within the SearchNoUseMemStage.py file.
New stages:
SearchNoUseMemStage: This stage searches for unnecessary top memory levels for each layer and generates a pointer indicating which level the output data of each layer should travel up to.
RemoveNoUseMemStage: This stage removes memory instances with a level higher than the pointer.
How to use:
Place the SearchNoUseMemStage before the WorkloadStage, and place the RemoveNoUseMemStage after the WorkloadStage.
Example:
An example function, get_hardware_performance_zigzag_unused_mem_removing is provided in api.py for reference.
SearchNoUseMemStage is renamed as SearchUnusedMemoryStage, RemoveNoUseMemStage is renamed as RemoveUnusedMemoryStage.
Coding style in stages above are reformated using black.
Add more comments in the new stages, including explicitly label whether a operand is of the layer representation or the mem representation.
pytest cases are added under /tests/main/test_without_unused_memory/. Original pytest cases are moved to /tests/main/test_origin/.
Extra note:
Only when user-defined workload (.py) is used and there are Adder layers within the workload, the estimation of SearchUnusedMemoryStage for these Adder layers is more pessimistic, which means the topest memory level for output of these Adder layers possibly is higher than expected. (For details, please refer to the comments in SearchUnusedMemoryStage).
For other cases (.onnx workload or non-Adder layers), there is no problem.
New feature:
Note:
workload_data_always_from_top_mem
parameter toTrue
in therun()
function within theSearchNoUseMemStage.py
file.New stages:
How to use:
Example:
An example function,
get_hardware_performance_zigzag_unused_mem_removing
is provided inapi.py
for reference.