Open Maxfashko opened 6 years ago
The Tensorrt softmax layer api dose not support axis:2. You must implement it by yourself.
Hi @chenzhi1992. I can not understand in what format the data to the enqueue function come to me. I realized that the data comes in an enqueue function in the form of a vector in which (for 21 classes of pascal voc) there are 12764 elements located in the vector sequentially. That is, in the function as an input parameter there is a pointer to a vector of 268044 elements.
//pluginimlement.cpp
int SoftmaxPlugin::enqueue(int batchSize, const void*const *inputs, void** outputs, void*, cudaStream_t stream)
{
float *top_data = reinterpret_cast<float*>(outputs[0]);
const float *bottom_data = reinterpret_cast<const float*>(inputs[0]);
SoftmaxLayer(size_array, bottom_data, top_data, stream);
return 0;
}
//I defined the output of the layer as follows:
Dims getOutputDimensions(int index, const Dims* inputs, int nbInputDims) override
{
assert(nbInputDims == 1);
assert(index == 0);
assert(inputs[index].nbDims == 3);
return DimsCHW(inputs->d[0], inputs->d[1], inputs->d[2]); // сохраняем параметры выхода
}
void configure(const Dims*inputs, int nbInputs, const Dims* outputs, int nbOutputs, int) override
{
mCopySize = inputs[0].d[0] * inputs[0].d[1] * inputs[0].d[2] * sizeof(float);
size_array = inputs[0].d[0]*inputs[0].d[1];
}
//mathFunctions.h
cudaError_t SoftmaxLayer(int nthreads,
const float *bottom_data,
float *top_data,
cudaStream_t stream);
The kernel source code is simple, and I will not post it.
//mathFunction.cu
cudaError_t SoftmaxLayer(int nthreads,
const float *bottom_data,
float *top_data,
cudaStream_t stream)
{
float *max_elem = new float;
float *sum = new float;
// sarch max elem
kernel_channel_max<float><<<TENSORRT_GET_BLOCKS(nthreads), TENSORRT_CUDA_NUM_THREADS,0,stream>>>(nthreads, bottom_data, *max_elem);
// subtract max elem
kernel_channel_subtract<float><<<TENSORRT_GET_BLOCKS(nthreads), TENSORRT_CUDA_NUM_THREADS,0,stream>>>(nthreads, bottom_data, *max_elem, top_data);
// exp. The data from which the maximum element was subtracted are in top_data
kernel_exp<float><<<TENSORRT_GET_BLOCKS(nthreads), TENSORRT_CUDA_NUM_THREADS,0,stream>>>(nthreads, top_data, top_data);
// sum
kernel_sum<float><<<TENSORRT_GET_BLOCKS(nthreads), TENSORRT_CUDA_NUM_THREADS,0,stream>>>(nthreads, top_data, *sum);
delete max_elem;
delete sum;
return cudaPeekAtLastError();
}
After these conversions, it does not look like I see what I need at the output.
//main.cpp
// структура для хранения аттрибутов слоя-вывода
struct detection_out_struct {
float image_id, label, score, xmin, ymin, xmax, ymax;
};
// наполнить вектор аттрибутами распознанного изображения
void set_detection_out(){
nvinfer1::DimsCHW dims = this->tensorNet.getTensorDims("detection_out");
detection_out_struct * out = (detection_out_struct*) this->allocateMemory(dims);
for(int i=0; i < this->tensorNet.getTensorDims("detection_out") ; i++){
if (out[i].label != -1){
this->detection_out.push_back(&out[i]);
}
}
}
std::vector<detection_out_struct*> detection_out; // вектор указателей на указатели структур аттрибутов распознанного изображения
then I draw all the boxes and see the result:
What is the ConcatPlugin class implemented for? This class is not used anywhere. Instead it is used createConcatPlugin declared NvInferPlugin.h.
else if (!strcmp(layerName, "mbox_loc"))
{
assert(mBox_loc_layer.get() == nullptr);
mBox_loc_layer = std::unique_ptr<INvPlugin, decltype(nvPluginDeleter)>
(createConcatPlugin(1, true), nvPluginDeleter);
return mBox_loc_layer.get();
}
@chenzhi1992 , I've reviewed the code caffe softmax layer https://github.com/BVLC/caffe/blob/master/src/caffe/layers/softmax_layer.cu. The main problem for me is how to set the axis for maximum summation and exponent operations. I understand correctly, I get a pointer to a vector of 1x268044? How is the axis defined in this case?
Hi @Maxfashko, I am also stuck with the same problem. Were you able to fix it, and if so could you tell how?
@tianfanzhu, no, I did not solve this problem. I have other tasks, but as soon as I have time, I will study the code caffe
@Maxfashko Can you update your kernel functions for SoftmaxLayer here?
@linux-devil, I hope I will help more than the author of this repository
__global__ void kernelSoftmax( float* x, int channels, float* y)
{
extern __shared__ float mem[];
__shared__ float sum_value;
float number = *(x + blockDim.x*blockIdx.x + threadIdx.x);
float number_exp = __expf(number);
atomicAdd(&sum_value, number_exp);
__syncthreads();
y[blockDim.x*blockIdx.x + threadIdx.x] = __fdiv_rd(number_exp, sum_value);
}
void cudaSoftmax(int n, int channels, float* x, float*y)
{
kernelSoftmax<<< (n/channels), channels, channels*sizeof(float)>>>( x, channels, y);
cudaDeviceSynchronize();
}
@Maxfashko how did you get label name from label.txt? Thanks
@quocbh https://github.com/Maxfashko/NV_TRT_SSD enjoy yourself
In void cudaSoftmax(int n, int channels, float x, floaty) , what is n? how you calculate n?
@linux-devil, I hope I will help more than the author of this repository
__global__ void kernelSoftmax( float* x, int channels, float* y) { extern __shared__ float mem[]; __shared__ float sum_value; float number = *(x + blockDim.x*blockIdx.x + threadIdx.x); float number_exp = __expf(number); atomicAdd(&sum_value, number_exp); __syncthreads(); y[blockDim.x*blockIdx.x + threadIdx.x] = __fdiv_rd(number_exp, sum_value); } void cudaSoftmax(int n, int channels, float* x, float*y) { kernelSoftmax<<< (n/channels), channels, channels*sizeof(float)>>>( x, channels, y); cudaDeviceSynchronize(); }
should add one line after "shared float sum_value;"
__shared__ float sum_value;
sum_value=0;
Hey. I implemented the inference using this code. But it seems that the boxes do not correspond to the actual location of objects in the image. I did not implement the softmax layer in the code, instead I used the usual softmаx caffe layer.
Is it necessary to use the softmax layer, implemented by myself?