@VoVAllen I tried this way of zerocopy implementation. Look good on my side. Basically replace set_output_buf with set_output and pass a Tensor object. I checked its implementation here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_kernel.cc#L870. set_output will decide whether the provided Tensor could be forwarded so to avoid a copy. I print out the log setting TF_CPP_MIN_VLOG_LEVEL and didn't find it performs a physical copy (here). Would you help check it by looking at the memory consumption as well?
@VoVAllen I tried this way of zerocopy implementation. Look good on my side. Basically replace
set_output_buf
withset_output
and pass a Tensor object. I checked its implementation here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_kernel.cc#L870.set_output
will decide whether the provided Tensor could be forwarded so to avoid a copy. I print out the log settingTF_CPP_MIN_VLOG_LEVEL
and didn't find it performs a physical copy (here). Would you help check it by looking at the memory consumption as well?