The speed of using the aclnn path by the slice kernel is fast in the repeated compilation scenario, and the aclop speed is fast in the non-compilation scenario. Environment variables can be selected. For LLM, set this parameter to False(default). For a small model, set this parameter to export FLAGS_npu_scale_aclnn=True.
The scale kernel has a precision problem that is not located in the LLM scenario. Use environment variables to avoid this problem. Use export FLAGS_npu_split_aclnn=True for Small Models