DefTruth / lite.ai.toolkit

🛠 A lite C++ toolkit of awesome AI models, support ONNXRuntime, MNN. Contains YOLOv5, YOLOv6, YOLOX, YOLOv8, FaceDet, HeadSeg, HeadPose, Matting etc. Engine: ONNXRuntime, MNN.
https://github.com/DefTruth/lite.ai.toolkit
GNU General Public License v3.0
3.53k stars 672 forks source link

MacOS下启用Metal加速后,效果较差,需要如何改进呢? #395

Open xp19870106 opened 9 months ago

xp19870106 commented 9 months ago

我修改了如下的代码部分 首先,设置type为Metal

backend_config.precision = MNN::BackendConfig::Precision_High; schedule_config.backendConfig = &backend_config; schedule_config.type = MNN_FORWARD_METAL; schedule_config.backupType = MNN_FORWARD_METAL;

然后注释掉void MNNRobustVideoMatting::initialize_context()中的初始化代码

// resize session mnn_interpreter->resizeSession(mnn_session); // init 0. // std::fill_n(r1i_tensor->host(), r1i_size, 0.f); // std::fill_n(r2i_tensor->host(), r2i_size, 0.f); // std::fill_n(r3i_tensor->host(), r3i_size, 0.f); // std::fill_n(r4i_tensor->host(), r4i_size, 0.f);

最后更新void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor *> &output_tensors)

void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor > &output_tensors) { auto device_r1o_ptr = output_tensors.at("r1o"); auto device_r2o_ptr = output_tensors.at("r2o"); auto device_r3o_ptr = output_tensors.at("r3o"); auto device_r4o_ptr = output_tensors.at("r4o"); MNN::Tensor cpu1 = MNN::Tensor::createHostTensorFromDevice(device_r1o_ptr, true); MNN::Tensor cpu2 = MNN::Tensor::createHostTensorFromDevice(device_r2o_ptr, true); MNN::Tensor cpu3 = MNN::Tensor::createHostTensorFromDevice(device_r3o_ptr, true); MNN::Tensor * cpu4 = MNN::Tensor::createHostTensorFromDevice(device_r4o_ptr, true);

device_r1o_ptr->copyFromHostTensor(cpu1); device_r2o_ptr->copyFromHostTensor(cpu2); device_r3o_ptr->copyFromHostTensor(cpu3); device_r4o_ptr->copyFromHostTensor(cpu4);

//device_r1o_ptr->copyToHostTensor(r1i_tensor); //device_r2o_ptr->copyToHostTensor(r2i_tensor); //device_r3o_ptr->copyToHostTensor(r3i_tensor); //device_r4o_ptr->copyToHostTensor(r4i_tensor);

context_is_update = true; }

最后的效果如下

截屏2023-10-23 16 56 36
github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 30 days with no activity.