Jyxarthur / flowsam

Official Implementation of "Moving Object Segmentation: All You Need Is SAM (and Flow)" Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman
https://www.robots.ox.ac.uk/~vgg/research/flowsam/
Apache License 2.0
223 stars 20 forks source link

Inference with FlowPSAM+FlowISAM #8

Open jes-bro opened 3 weeks ago

jes-bro commented 3 weeks ago

The paper mentions "layering the motion segmentation masks behind RGB-based segmentation masks." I don't see any layering in the eval script. How/where do you perform inference for FlowP-SAM + FlowI-SAM?

charigyang commented 3 weeks ago

Hi, thanks for your interest in our work.

This is L66-68 in seq_level_postprocess.py

jes-bro commented 3 weeks ago

I'm assuming pred_dir should have the RGB/FlowPSAM-based segmentations and the flow_pred_dir should have the flow/FlowISAM based ones? Thanks

charigyang commented 3 weeks ago

That is correct — we will clarify this in the code/md shortly to make this clearer

jes-bro commented 3 weeks ago

Thanks so much!

jes-bro commented 3 weeks ago

Another question: In the dataset directory structure:

{data_name}/ ├── JPEGImages/ │ └── {category_name}/ │ ├── 00000.jpg │ └── ...... ├── FlowImages_gap1/ │ └── {category_name}/ │ ├── 00000.png │ └── ...... ├── ...... (More flow images)

Is it assumed that flowimages_gap_1/00000.png is between JPEG images 00000.jpg and 00001.jpg and flowimages_gap_2/00000.png would be between 00000.jpg and 00002.jpg?

charigyang commented 3 weeks ago

That's correct, more precisely flowimages_gap_1/00000.png is from JPEG images 00000.jpg to 00001.jpg

jes-bro commented 3 weeks ago

Also, what should val_seq be? It's set to None in the dataset config and I'm not sure what it represents

charigyang commented 3 weeks ago

Leaving it as None defaults to evaluating on the entire validation set. You may put in your list of sub-sequences instead if you're debugging something and don't want to evaluate on all sequences, but otherwise just leave it as such.

L35-52 of data/dataloaders/ytvos_loader.py should reflect this example

jes-bro commented 3 weeks ago

It looks like seq_level_postprocess is missing a read import- where does that come from?

charigyang commented 3 weeks ago

Hi, sorry -- we'll add to the main version soon, but meanwhile this is the function.

def read(file):
    assert type(file) is str, "file is not str %r" % str(file)
    assert os.path.isfile(file) is True, "file does not exist %r" % str(file)
    assert file[-4:] == '.flo', "file ending is not .flo %r" % file[-4:]
    f = open(file,'rb')
    flo_number = np.fromfile(f, np.float32, count=1)[0]
    assert flo_number == TAG_FLOAT, 'Flow number %r incorrect. Invalid .flo file' % flo_number
    w = np.fromfile(f, np.int32, count=1)
    h = np.fromfile(f, np.int32, count=1)
    data = np.fromfile(f, np.float32, count=2*w[0]*h[0])
    #data = np.fromfile(f, np.float32, count=2*w*h)
    # Reshape data into 3D array (columns, rows, bands)
    flow = np.resize(data, (int(h), int(w), 2)) 
    f.close()

    return flow
jes-bro commented 3 weeks ago

Is seq_level_postprocess how you generated the figures on the website? As in the segmentation overlaid with the RGB?

charigyang commented 3 weeks ago

To create an overlay figure, use the mask output together with cv2.addWeighted. I don’t think it’s in the repo

jes-bro commented 3 weeks ago

Another question, in some of the figures on the website, it says flowISAM (sequential), does that mean the sequential post processing script was used to generate them? Thanks!

charigyang commented 3 weeks ago

For single-object cases we just use the first mask for all frames, no need to run. For multi-object, yes.

jes-bro commented 2 weeks ago

How do you filter out just the first mask?

jes-bro commented 2 weeks ago

Also, where does the sum function used here come from?

def iou(masks, gt, thres=0.5):
    """ IoU predictions """
    masks = (masks>thres).float()
    gt = (gt>thres).float()
    intersect = (masks * gt).sum(dim=[-2, -1])
    union = masks.sum(dim=[-2, -1]) + gt.sum(dim=[-2, -1]) - intersect
    empty = (union < 1e-6).float()
    iou = torch.clip(intersect/(union + 1e-12) + empty, 0., 1.)
    return iou

I got this error:

/flowsam/utils.py", line 21, in iou
    intersect = (masks * gt).sum(dim=[-2, -1])
TypeError: _sum() got an unexpected keyword argument 'dim'

I also got an error about floats in iou. Is that version of the iou function up to date?

And just to double check, the flow_dir directories should point to Flows_gapX directories, right? With the .flo files? Thanks!

jes-bro commented 2 weeks ago

I think in seq_level_postprocess.py the masks are being np.stacked together, and it seems like they should be torch.stacked together, if iou assumes that masks is a tensor?

charigyang commented 2 weeks ago

How do you filter out just the first mask?

As in, just do f = processMultiSeg(...)[0:1] -- this picks the first one (the one with highest predicted score)

it seems like they should be torch.stacked

Very likely the case. If this doesn't work let me know and I'll find a fix, otherwise I'll update it shortly.

the flow_dir directories should point to Flows_gapX directories, right? With the .flo files?

Yes this is correct.

jes-bro commented 2 weeks ago

When you specify the number of objects in the eval, does that correspond to the number of masks in the segmentation?

charigyang commented 2 weeks ago

That might be in the GT seg? so single object would be 1 and multi object will be max in that dataset

jes-bro commented 2 weeks ago

What is the emp parameter? Why is it needed? It also looks like it gets passed to iou a bunch but iou doesn't have an arg for it

jes-bro commented 2 weeks ago
Very likely the case. If this doesn't work let me know and I'll find a fix, otherwise I'll update it shortly.

In terms of seq_level_postprocess.py I haven't been able to get it to produce good masks. They're really pixelated for some reason. The only thing I modified was converting between numpy arrays and torch tensors to resolve errors.

jes-bro commented 2 weeks ago

Would you mind updating the sequential post processing script to match whatever it was when it worked? I would love to be able to reproduce your results. Thank you!

charigyang commented 2 weeks ago

I'll likely have time this weekend to update.

Meanwhile could you check whether the individual masks are correct i.e. frame-wise score matches the table?

charigyang commented 2 weeks ago

I've updated the iou in utils.py -- I uploaded the wrong version initially. The emp is just a tiny consistency on how to deal with empty masks (to be consistent with previous literature).

If your masks look pixelated -- could you check your initial individual masks prediction whether they look reasonable? i.e. matching the pretrained model. If not then the issue might also be with the model.