Lsan2401 / RMSIN

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
83 stars 5 forks source link

Problem with your dataset #9

Open vvuonghn opened 8 months ago

vvuonghn commented 8 months ago

Hi @Lsan2401 Currently, you are provide the instances.json with RLE format, it looks like that,

ann['segmentation'] [{'size': [800, 800], 'counts': 'ikn9110030M1082GN0002B0NN010O0021gf03WYOL20O080G11O?0@10O0130MO01jf0OVYO00000000000:0Ff00[O0O0020O0O00003gf0MYYO011OO10O27h0IVO0013OM111Ogf01WYO00O000]10cN10O010O1if0OXYO0O0000e11[NO20Nhf00XYO00l10TN01hf0OXYO00n10SN0Ooe00RZO3ON011NO01\2OdM010001be00bZONM0O11b2O]M000011^e00bZO5OJ001e20[MO20N_e00bZO20NO10O0i22WMN1_e0OaZO12O00Nm21SMO0_e00aZO0110R3ae0mL^ZO0010V3e0kLZON0Y3od0gLS[O2NN0010O10O0_30bL1OO11O_d02c[ONM000012d3N[L0010^d02a[O30K10Oi30XL0O`d00a[O21ON00g30XL21^d0O[O12OOQ4OoK00_d00a[O0100T4cd0lK\...}]

Could you provide the polygons format (same with refcoco)

ann['segmentation'] [[223.18, 477.41, 178.25, 476.84, 167.3, 468.2, 156.93, 464.16, 151.17, 464.74, 141.38, 471.65, 132.16, 476.26, 125.25, 476.26, 126.98, 451.49, 113.73, 448.61, 103.93, 439.39, 111.42, 419.81, 136.19, 373.15, 140.8, 363.36, 169.03, 352.99, 166.72, 337.43, 174.21, 301.72, 184.01, 300.57, 200.14, 299.99, 214.54, 314.39, 215.69, 332.83, 211.08, 359.32, 224.91, 372.57, 232.97, 388.13, 238.15, 420.96, 237.0, 443.43, 224.91, 452.64, 219.14, 453.22]]

I can convert from RLE to Polygons, but the information loss due to convert

Lsan2401 commented 8 months ago

Hi @Lsan2401 Currently, you are provide the instances.json with RLE format, it looks like that,

ann['segmentation'] [{'size': [800, 800], 'counts': 'ikn9110030M1082GN0002B0NN010O0021gf03WYOL20O080G11O?0@10O0130MO01jf0OVYO00000000000:0Ff00[O0O0020O0O00003gf0MYYO011OO10O27h0IVO0013OM111Ogf01WYO00O000]10cN10O010O1if0OXYO0O0000e11[NO20Nhf00XYO00l10TN01hf0OXYO00n10SN0Ooe00RZO3ON011NO01\2OdM010001be00bZONM0O11b2O]M000011^e00bZO5OJ001e20[MO20N_e00bZO20NO10O0i22WMN1_e0OaZO12O00Nm21SMO0_e00aZO0110R3ae0mL^ZO0010V3e0kLZON0Y3od0gLS[O2NN0010O10O0_30bL1OO11O_d02c[ONM000012d3N[L0010^d02a[O30K10Oi30XL0O`d00a[O21ON00g30XL21^d0O[O12OOQ4OoK00_d00a[O0100T4cd0lK...}]

Could you provide the polygons format (same with refcoco)

ann['segmentation'] [[223.18, 477.41, 178.25, 476.84, 167.3, 468.2, 156.93, 464.16, 151.17, 464.74, 141.38, 471.65, 132.16, 476.26, 125.25, 476.26, 126.98, 451.49, 113.73, 448.61, 103.93, 439.39, 111.42, 419.81, 136.19, 373.15, 140.8, 363.36, 169.03, 352.99, 166.72, 337.43, 174.21, 301.72, 184.01, 300.57, 200.14, 299.99, 214.54, 314.39, 215.69, 332.83, 211.08, 359.32, 224.91, 372.57, 232.97, 388.13, 238.15, 420.96, 237.0, 443.43, 224.91, 452.64, 219.14, 453.22]]

I can convert from RLE to Polygons, but the information loss due to convert

We initially employed the RLE format to store the masks. So if the code you utilized for the format conversion is correct, precision loss may be inevitable. Nevertheless, the masks in these two formats shouldn't deviate significantly. By the way, if you wish to visualize the masks, you can directly use the RLE masks; there seems to be no necessity to convert them to the polygon format.

vvuonghn commented 8 months ago

Hi @Lsan2401 Thank you for your reply! Could you provide the binary mask for segmentation rather than RLE. It is more accuracy to compare with convert RLE to mask.

Lsan2401 commented 8 months ago

Hi @Lsan2401 Thank you for your reply! Could you provide the binary mask for segmentation rather than RLE. It is more accuracy to compare with convert RLE to mask.

You can easily obtain the binary mask by referring to the data pre-processing code in our project. In fact, the mask is converted to binary format during the training and testing processes.

vvuonghn commented 8 months ago

Hi @Lsan2401 Because there are a lot of small objects in the dataset, if we convert from RLE to Mask, the loss of ground truth data has a significant impact on performance. For example, the loss of boundary information on the small mask may significantly reduce the Intersection over Union (IoU). If the original mask contains 100 pixels, but when we convert it, the new mask contains only 50 pixels, the IoU is reduced

vvuonghn commented 8 months ago

@Lsan2401 Btw, could you please check some sample on your public dataset, maybe the mask have problem

Here is a sample (01172.jpg) Text: The gray bridge in the middle Why segmentation not cover bridge

image

In case 22118.jpg Text: The yellow and orange large overpass Why the segmentation is over the bbox, I think bbox should cover the segmentation area

image
vvuonghn commented 8 months ago

Here is 3 problems with your dataset. If you release the dataset under RLE format, there are a loss information. Could you please check

image
Lsan2401 commented 8 months ago

Hi @Lsan2401 Because there are a lot of small objects in the dataset, if we convert from RLE to Mask, the loss of ground truth data has a significant impact on performance. For example, the loss of boundary information on the small mask may significantly reduce the Intersection over Union (IoU). If the original mask contains 100 pixels, but when we convert it, the new mask contains only 50 pixels, the IoU is reduced

We have computed the IoU between the binary mask directly generated by the model and those that have been converted to RLE format and then back to binary masks. It appears that there is no discernible precision loss, as the IoU values for all objects remain consistently at 1.

Lsan2401 commented 8 months ago

@Lsan2401 Btw, could you please check some sample on your public dataset, maybe the mask have problem

Here is a sample (01172.jpg) Text: The gray bridge in the middle Why segmentation not cover bridge image

In case 22118.jpg Text: The yellow and orange large overpass Why the segmentation is over the bbox, I think bbox should cover the segmentation area image

Thank you for bringing this to our attention. In our human refining process, our primary focus has been on enhancing the precision of boundary neatness, potentially overlooking instances where objects may have been incorrectly split by SAM. We acknowledge this issue and will take measures to double-check and rectify such cases in the next release. The extension of the mask beyond the bounding box border is considered normal. This is due to the inaccuracies in the boundary information of the box, and we have manually adjusted SAM-generated masks to compensate for these inaccuracies.

vvuonghn commented 7 months ago

@Lsan2401 Do you refine the issue in some case in your dataset? If yes, could you please share the new dataset after refine And also, could you provide the binary mask.