[NPU] slice & split aclnn kernel add enviroment variable

The speed of using the aclnn path by the slice kernel is fast in the repeated compilation scenario, and the aclop speed is fast in the non-compilation scenario. Environment variables can be selected. For LLM, set this parameter to False(default). For a small model, set this parameter to export FLAGS_npu_scale_aclnn=True.
The scale kernel has a precision problem that is not located in the LLM scenario. Use environment variables to avoid this problem. Use export FLAGS_npu_split_aclnn=True for Small Models

PaddlePaddle / PaddleCustomDevice