SHI-Labs / OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
https://praeclarumjj3.github.io/oneformer
MIT License
1.41k stars 128 forks source link

How to generate instance mask, only one channel? #27

Closed rockywind closed 1 year ago

praeclarumjj3 commented 1 year ago

Hi @rockywind, thanks for your interest in our work.

If I understand your question correctly, you want to generate a single-channel segmentation mask for the instance segmentation predictions. Please provide more description about your issue if this is not the case.

You can loop through the pred_masks stored in the instance predictions, assign an ID to each mask, and aggregate those into a single channel mask. https://github.com/SHI-Labs/OneFormer/blob/761189909f392a110a4ead574d85ed3a17fbc8a7/oneformer/oneformer_model.py#L475

rockywind commented 1 year ago

Hi, thank you for your help. Each pixel value represents an instance category, the value is 1,2,3, and so on. The 0 is the representation's background. But, I found that the value of result.pred_masks is between 0 and 1, the shape of result.pred_masks is [7, 1114, 2191], the image's size is [1114, 2191] .

praeclarumjj3 commented 1 year ago

I believe you are talking about the semantic segmentation result, where each pixel corresponds to the corresponding object's category.

You need to do an argmax operation on the semantic predictions to obtain those. https://github.com/SHI-Labs/OneFormer/blob/761189909f392a110a4ead574d85ed3a17fbc8a7/demo/predictor.py#L68

rockywind commented 1 year ago

Hi, Sorry for not being clear before. The following is sample data。 There are 3 cars in the picture, the first car's pixel value is 1, the second car's pixel value is 2, and the third car's pixel is 3. 0151

praeclarumjj3 commented 1 year ago

Each pixel value represents an instance category, the value is 1,2,3, and so on. The 0 is the representation's background. But, I found that the value of result.pred_masks is between 0 and 1, the shape of result.pred_masks is [7, 1114, 2191], the image's size is [1114, 2191] .

Right, that's what I thought you wanted to do. You can loop through the result.pred_masks, assign an ID (starting from 1) to each mask, and aggregate them on an all-zeros mask. Please find the pseudo-code below:

# create an all-zeros mask
single_channel_mask = torch.zeros_like(image) # or torch.zeros((1114, 2191))
count = 0

# loop through all instance masks
for mask in result.pred_masks:
    count += 1
    mask *= count
    single_channel_mask = torch.max(single_channel_mask, mask)

Let me know if you have any more issues.

rockywind commented 1 year ago

Thank you very much. I have a try!