Open jes-bro opened 5 months ago
Hi, thanks for your interest in our work.
This is L66-68 in seq_level_postprocess.py
I'm assuming pred_dir should have the RGB/FlowPSAM-based segmentations and the flow_pred_dir should have the flow/FlowISAM based ones? Thanks
That is correct — we will clarify this in the code/md shortly to make this clearer
Thanks so much!
Another question: In the dataset directory structure:
{data_name}/ ├── JPEGImages/ │ └── {category_name}/ │ ├── 00000.jpg │ └── ...... ├── FlowImages_gap1/ │ └── {category_name}/ │ ├── 00000.png │ └── ...... ├── ...... (More flow images)
Is it assumed that flowimages_gap_1/00000.png is between JPEG images 00000.jpg and 00001.jpg and flowimages_gap_2/00000.png would be between 00000.jpg and 00002.jpg?
That's correct, more precisely flowimages_gap_1/00000.png is from JPEG images 00000.jpg to 00001.jpg
Also, what should val_seq be? It's set to None in the dataset config and I'm not sure what it represents
Leaving it as None
defaults to evaluating on the entire validation set. You may put in your list of sub-sequences instead if you're debugging something and don't want to evaluate on all sequences, but otherwise just leave it as such.
L35-52 of data/dataloaders/ytvos_loader.py should reflect this example
It looks like seq_level_postprocess is missing a read import- where does that come from?
Hi, sorry -- we'll add to the main version soon, but meanwhile this is the function.
def read(file):
assert type(file) is str, "file is not str %r" % str(file)
assert os.path.isfile(file) is True, "file does not exist %r" % str(file)
assert file[-4:] == '.flo', "file ending is not .flo %r" % file[-4:]
f = open(file,'rb')
flo_number = np.fromfile(f, np.float32, count=1)[0]
assert flo_number == TAG_FLOAT, 'Flow number %r incorrect. Invalid .flo file' % flo_number
w = np.fromfile(f, np.int32, count=1)
h = np.fromfile(f, np.int32, count=1)
data = np.fromfile(f, np.float32, count=2*w[0]*h[0])
#data = np.fromfile(f, np.float32, count=2*w*h)
# Reshape data into 3D array (columns, rows, bands)
flow = np.resize(data, (int(h), int(w), 2))
f.close()
return flow
Is seq_level_postprocess how you generated the figures on the website? As in the segmentation overlaid with the RGB?
To create an overlay figure, use the mask output together with cv2.addWeighted. I don’t think it’s in the repo
Another question, in some of the figures on the website, it says flowISAM (sequential), does that mean the sequential post processing script was used to generate them? Thanks!
For single-object cases we just use the first mask for all frames, no need to run. For multi-object, yes.
How do you filter out just the first mask?
Also, where does the sum function used here come from?
def iou(masks, gt, thres=0.5):
""" IoU predictions """
masks = (masks>thres).float()
gt = (gt>thres).float()
intersect = (masks * gt).sum(dim=[-2, -1])
union = masks.sum(dim=[-2, -1]) + gt.sum(dim=[-2, -1]) - intersect
empty = (union < 1e-6).float()
iou = torch.clip(intersect/(union + 1e-12) + empty, 0., 1.)
return iou
I got this error:
/flowsam/utils.py", line 21, in iou
intersect = (masks * gt).sum(dim=[-2, -1])
TypeError: _sum() got an unexpected keyword argument 'dim'
I also got an error about floats in iou. Is that version of the iou function up to date?
And just to double check, the flow_dir directories should point to Flows_gapX directories, right? With the .flo files? Thanks!
I think in seq_level_postprocess.py the masks are being np.stacked together, and it seems like they should be torch.stacked together, if iou assumes that masks is a tensor?
How do you filter out just the first mask?
As in, just do f = processMultiSeg(...)[0:1] -- this picks the first one (the one with highest predicted score)
it seems like they should be torch.stacked
Very likely the case. If this doesn't work let me know and I'll find a fix, otherwise I'll update it shortly.
the flow_dir directories should point to Flows_gapX directories, right? With the .flo files?
Yes this is correct.
When you specify the number of objects in the eval, does that correspond to the number of masks in the segmentation?
That might be in the GT seg? so single object would be 1 and multi object will be max in that dataset
What is the emp parameter? Why is it needed? It also looks like it gets passed to iou a bunch but iou doesn't have an arg for it
Very likely the case. If this doesn't work let me know and I'll find a fix, otherwise I'll update it shortly.
In terms of seq_level_postprocess.py I haven't been able to get it to produce good masks. They're really pixelated for some reason. The only thing I modified was converting between numpy arrays and torch tensors to resolve errors.
Would you mind updating the sequential post processing script to match whatever it was when it worked? I would love to be able to reproduce your results. Thank you!
I'll likely have time this weekend to update.
Meanwhile could you check whether the individual masks are correct i.e. frame-wise score matches the table?
I've updated the iou in utils.py -- I uploaded the wrong version initially. The emp is just a tiny consistency on how to deal with empty masks (to be consistent with previous literature).
If your masks look pixelated -- could you check your initial individual masks prediction whether they look reasonable? i.e. matching the pretrained model. If not then the issue might also be with the model.
I have one question: What should I set to the "gap"?
The paper mentions "layering the motion segmentation masks behind RGB-based segmentation masks." I don't see any layering in the eval script. How/where do you perform inference for FlowP-SAM + FlowI-SAM?