OroChippw / SegmentAnything-OnnxRunner

SegmentAnything-OnnxRunner is an example using Meta AI Research's SAM onnx model in C++.The encoder and decoder of SAM are decoupled in this repository.
MIT License
97 stars 19 forks source link

About result. #30

Open cainiaoshidai opened 11 months ago

cainiaoshidai commented 11 months ago

Hi , The result I got after reasoning with this code is as follows. Can you give me some advice, how can I solve this problem.

cainiaoshidai commented 11 months ago

1699961685006

OroChippw commented 11 months ago

Did you pull the code from the master branch? Since the code is currently being iterated, the code was merged incorrectly on the master branch😂😂😂. You can try switching to the dev branch and modify the parameters in the code to see if the problem still exists. When the subsequent master branch repair is completed, you will also be notified. Thank you for paying attention to this repository .❤️❤️❤️

cainiaoshidai commented 11 months ago

Did you pull the code from the master branch? Since the code is currently being iterated, the code was merged incorrectly on the master branch😂😂😂. You can try switching to the dev branch and modify the parameters in the code to see if the problem still exists. When the subsequent master branch repair is completed, you will also be notified. Thank you for paying attention to this repository .❤️❤️❤️

Yes, I pull the code from the master branch. But when I switching to the dev branch the problem is still exist. I just use myself onnx model. It's useful in python.

cainiaoshidai commented 11 months ago

Hi, I think I found the reason. I compared the python code and found that the image normalization operation is missing in the image processing. When I added the normalization operation, the results were great. Thank you for your project, it is very helpful to me.

input_image_torch = torch.as_tensor(input_image, device=device)

input_image_torch = input_image_torch.permute(2, 0, 1).contiguous()[None, :, :, :]

pixel_mean = torch.Tensor([123.675, 116.28, 103.53]).view(-1, 1, 1).to(device)

pixel_std = torch.Tensor([58.395, 57.12, 57.375]).view(-1, 1, 1).to(device)

x = (input_image_torch - pixel_mean) / pixel_std

OroChippw commented 11 months ago

Which part of the code needs to be modified? Can you give me the correct example for reference? Thank you for your contribution.

cainiaoshidai commented 11 months ago

I change the code Image_PreProcess to follow.

    std::cout << "PreProcess Image ..." << std::endl;
cv::Mat rgbImage;
cv::cvtColor(srcImage, rgbImage, cv::COLOR_BGR2RGB);
cv::Mat floatImage;
rgbImage.convertTo(floatImage, CV_32FC3);
cv::Mat pixelMean = cv::Mat::ones(cv::Size(1024, 1024), CV_32FC3);
cv::Mat pixelStd = cv::Mat::ones(cv::Size(1024, 1024), CV_32FC3);
pixelMean = cv::Scalar(123.675, 116.28, 103.53);
pixelStd = cv::Scalar(58.395, 57.12, 57.375);
floatImage -= pixelMean;
floatImage /= pixelStd;
cv::Mat resizeImage = ResizeLongestSide_apply_image(floatImage, EncoderInputSize);
// Normalization
// resizeImage.convertTo(resizeImage, 1.0 / 255.0);

int pad_h = EncoderInputSize - resizeImage.rows;
int pad_w = EncoderInputSize - resizeImage.cols;

cv::Mat paddingImage;
cv::copyMakeBorder(resizeImage, paddingImage, 0, pad_h, 0, pad_w, cv::BorderTypes::BORDER_CONSTANT, cv::Scalar(0, 0, 0));

std::cout << "paddingImage width : " << paddingImage.cols << ", paddingImage height : " << paddingImage.rows << std::endl;
return paddingImage;

I think the image is normalized in this way, rather than normalized to 0 to 1 or not normalized.