MacOS下启用Metal加速后，效果较差，需要如何改进呢？

我修改了如下的代码部分首先，设置type为Metal

backend_config.precision = MNN::BackendConfig::Precision_High; schedule_config.backendConfig = &backend_config; schedule_config.type = MNN_FORWARD_METAL; schedule_config.backupType = MNN_FORWARD_METAL;

然后注释掉void MNNRobustVideoMatting::initialize_context()中的初始化代码

// resize session mnn_interpreter->resizeSession(mnn_session); // init 0. // std::fill_n(r1i_tensor->host(), r1i_size, 0.f); // std::fill_n(r2i_tensor->host(), r2i_size, 0.f); // std::fill_n(r3i_tensor->host(), r3i_size, 0.f); // std::fill_n(r4i_tensor->host(), r4i_size, 0.f);

最后更新void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor *> &output_tensors)

void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor > &output_tensors) { auto device_r1o_ptr = output_tensors.at("r1o"); auto device_r2o_ptr = output_tensors.at("r2o"); auto device_r3o_ptr = output_tensors.at("r3o"); auto device_r4o_ptr = output_tensors.at("r4o"); MNN::Tensor cpu1 = MNN::Tensor::createHostTensorFromDevice(device_r1o_ptr, true); MNN::Tensor cpu2 = MNN::Tensor::createHostTensorFromDevice(device_r2o_ptr, true); MNN::Tensor cpu3 = MNN::Tensor::createHostTensorFromDevice(device_r3o_ptr, true); MNN::Tensor * cpu4 = MNN::Tensor::createHostTensorFromDevice(device_r4o_ptr, true);

device_r1o_ptr->copyFromHostTensor(cpu1); device_r2o_ptr->copyFromHostTensor(cpu2); device_r3o_ptr->copyFromHostTensor(cpu3); device_r4o_ptr->copyFromHostTensor(cpu4);

//device_r1o_ptr->copyToHostTensor(r1i_tensor); //device_r2o_ptr->copyToHostTensor(r2i_tensor); //device_r3o_ptr->copyToHostTensor(r3i_tensor); //device_r4o_ptr->copyToHostTensor(r4i_tensor);

context_is_update = true; }

最后的效果如下

DefTruth / lite.ai.toolkit

MacOS下启用Metal加速后，效果较差，需要如何改进呢？ #395