Open aleemsidra opened 10 months ago
There shouldn't be an inversion step normally. I would guess that the prompt given to the model is pointing to a part of the background, and if the background is all a similar color/appearance, then the model segments it as the 'answer' for the given prompt.
If the prompt is selecting the skull, then there may be a normalization issue with the prompt coordinates. For example if a single (0.5, 0.5) point prompt is given to select the skull, but the model is expecting pixel units, then (0.5, 0.5) points at the top-left corner of the image, which would select the background in this case. The solution may just be a matter of scaling the coordinates differently (or using .predict(...) vs. .predict_torch(...) which assume different units).
@heyoeyo , I didnot pass any prompts. I am just passing the input image to model as:
for idx in tqdm(range(len(dataset)), desc= f"Processing images", unit= "image"):
input_samples, gt_samples, voxel = dataset[idx]
slices = []
for slice_id, img_slice in tqdm(enumerate(input_samples), total=len(input_samples), desc="Processing slices"): # looping over single img
batched_input = [
{'image': prepare_image(img_slice, resize_transform, model),
'original_size': img_slice[0,:,:].shape}]
preds = model(batched_input)
slices.append(preds.squeeze().detach().cpu())
segmented_volume = torch.stack(slices, axis=0) $ stacking 2d slices
mask = torch.zeros(segmented_volume.shape)
mask[torch.sigmoid(segmented_volume) > 0.5] = 1 # thresholding```
I have commented out L123 in sam.py, as I want logits to compute dice score and returning only masks. Rest of the code is exactly same. Even if I use L123 in sam.py, I get inverted result.
If no prompts are provided, it looks like the output masks will be based on the learned no-mask embedding only. It's not really clear what parts of the image the no-mask embedding would tend to select, but I'd guess it doesn't specifically favor the center of the image and that's why the background gets selected.
A simple way to check if things are inverted would be to provide a point prompt near the center and see if the resulting mask includes/excludes that point.
I passed the box prompts, and it solved the issue.
I have gray scale skull image, given that image format for SAM accepts RGB image format, I converted my image to RGB mode and then scaled it as:
The output of SAM has the dimension of original input dimension which means it's gray scale. So I donot convert ground truth to RGB format for dice calculation. The brain skull in the ground truth is white, and background is black, but why SAM's output has reversed pixels as shown below. It should have white for skull and black for background
Because of this inversion when I am calculating surface dice score it is coming out to be zero. Can someone please pin point me where in SAM's code this inversion is ahppening