Segmentation: Output at index 0 has incorrect length

darkrotor commented 3 months ago

Hello, @cyrusbehr ! Thanks for the code you provided in this repository. Im using the standart segmentation yolov8s-seg.pt model from https://github.com/ultralytics/ultralytics that was exported to onnx with:

model.fuse() model.info(verbose=False) model.export(format="onnx")

But im getting

terminate called after throwing an instance of 'std::logic_error' what(): Output at index 0 has incorrect length

Also when building an engine with FP16 precision im getting following message:

onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. Model only supports fixed batch size of 1 TensorRT encountered issues when converting weights between types and that could affect accuracy. If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. Check verbose logs for the list of affected weights.

69 weights are affected by this issue: Detected subnormal FP16 values. Success, saved engine to yolov8s-seg.engine.Orin.fp16.1.1

Maybe you can give me an advice on how to fix this error and how to set the segmentation parameters correctly

The model can be found here: https://drive.google.com/drive/folders/1NWKfH7dMCbuK1GjpsHnTibiV3zkvQCO9

ryan-lang commented 2 months ago

@darkrotor Were you able to find a way past this? I'm having the same issue.

Best I can tell, in my case, the sample code is expecting the mask output shape (1, 32, 5120) but the model output is actually (1, 32, 160, 160), so there's some flattening of the last two dimensions that isn't happening. I could be way off base though.

If there's a problem getting the output dimensions, then it's possibly an issue for tensorrt-cpp-api

ryan-lang commented 2 months ago

I added some logging, which revealed something different than I thought. It looks like the order of the two outputs is getting swapped somewhere along the way.

I'm using the yolov8n-seg model, which has the output shape: ((1, 116, 8400), (1, 32, 160, 160)), but after inference, I'm seeing (1, 32, 160, 160) in the first output and (1, 116, 8400) in the second. I downloaded the model and did the onnx conversion per the docs.

In its current state, it immediately throws an error due to the sizes of the two outputs being opposite of what's expected.

Here's where I added the logging

std::vector<Object> YoloV8::postProcessSegmentation(std::vector<std::vector<float>> &featureVectors)
{
    const auto &outputDims = m_trtEngine->getOutputDims();

    int numChannels = outputDims[0].d[1];
    int numAnchors = outputDims[0].d[2];

    std::cout << "Output dims 0 " << outputDims[0].d[0] << " " << outputDims[0].d[1] << " " << outputDims[0].d[2] << " " << outputDims[0].d[3] << std::endl; 
    std::cout << "Output dims 1 " << outputDims[1].d[0] << " " << outputDims[1].d[1] << " " << outputDims[1].d[2] << " " << outputDims[1].d[3] << std::endl;

    const auto numClasses = numChannels - SEG_CHANNELS - 4;

    std::cout << "Vector 0 size " << featureVectors[0].size() << std::endl;
    std::cout << "Vector 1 size " << featureVectors[1].size() << std::endl;

Prints:

Output dims 0 1 32 160 160
Output dims 1 1 116 8400 0
Vector 0 size 819200
Vector 1 size 974400

Ray005 commented 2 months ago

I discovered that changing the imgsz when training your own yolov8 model can cause issues. It should be kept at the default value of 640 in Ultralytics.

You can found it here https://docs.ultralytics.com/modes/train/#usage-examples

# Train the model
results = model.train(data="sync-dataset.yaml", epochs=100, imgsz=640)

I hope this helps.

ryan-lang commented 2 months ago

@Ray005 This may work for others, but in my case, I'm not doing any training, just exporting the base model:

yolo export model=yolov8n-seg.pt format=onnx imgsz=640

jeremy-cv commented 2 months ago

Hi @ryan-lang ,

I got the same issue and it seems you were right the outputs are swapped.

I solved it by simply swapping featureVectors[0] and featureVectors[1] and by taking outputDims[1] instead of outputDims[0] to get the right numChannels and numAnchors.

cyrusbehr / YOLOv8-TensorRT-CPP

Segmentation: Output at index 0 has incorrect length #63