Fine-tuning Oneformer - Githubissues

nickponline commented 11 months ago

The process for fine-tuning Oneformer seems different to MaskFormer and Mask2Former. No matter what I try I can't seem to get the model to work. Here's an example, which I feel should work for semantic segmentation:


preprocessor = AutoProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large", num_text=1)
model = AutoModelForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large", id2label=config.id2label, ignore_mismatched_sizes=True)

print("*** Inputs ***")
print(type(images), len(images), type(image[0]), image[0].shape)
print(type(segmentation_maps), len(segmentation_maps), type(segmentation_maps[0]), segmentation_map[0].shape)

batch = preprocessor(
    images,
    ["semantic"] * len(images),
    segmentation_maps=segmentation_maps,
    return_tensors="pt",

)

print("*** Batch ***")
for k, v in batch.items():
    print(k, type(v), v.shape if hasattr(v, "shape") else len(v))

outputs = model(**batch) # Crashed here

for k, v in outputs.items():
    print(k)

Which gives crashes with following output:

*** Inputs ***
<class 'tuple'> 1 <class 'numpy.ndarray'> (256, 256)
<class 'tuple'> 1 <class 'numpy.ndarray'> (256,)
*** Batch ***
pixel_values <class 'torch.Tensor'> torch.Size([1, 3, 800, 800])
pixel_mask <class 'torch.Tensor'> torch.Size([1, 800, 800])
mask_labels <class 'list'> 1
class_labels <class 'list'> 1
text_inputs <class 'torch.Tensor'> torch.Size([1, 1, 77])
task_inputs <class 'torch.Tensor'> torch.Size([1, 77])
/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/functional.py:505: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3489.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                                                                  | 0/71 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "/Users/nickp/mr/oneformer/main.py", line 199, in <module>
    outputs = model(**batch)
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 3221, in forward
    loss_dict: Dict[str, Tensor] = self.get_loss_dict(
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 3076, in get_loss_dict
    loss_dict: Dict[str, Tensor] = self.criterion(
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 690, in forward
    indices = self.matcher(masks_queries_logits, class_queries_logits, mask_labels, class_labels)
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 306, in forward
    cost_class = -pred_probs[:, labels]
IndexError: index 3 is out of bounds for dimension 0 with size 3

@werner-rammer did you have any success?

nickponline commented 11 months ago

@NielsRogge are we missing something here?

SVDmodel commented 11 months ago

Unfortunately not - I tried a couple of things, but it never worked :(

nickponline commented 11 months ago

I've tried to get this working starting with the same image and mask: mask

First I tried:

image = Image.open('image.jpg').convert('RGB')
mask = Image.open('mask.png').convert('L')
processor = AutoProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large")
semantic_inputs = processor(images=image, segmentation_maps=mask, task_inputs=["semantic"], return_tensors="pt")
processor.tokenizer.batch_decode(semantic_inputs.task_inputs)
model = AutoModelForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")

with torch.no_grad():
  outputs = model(**semantic_inputs)

semantic_segmentation = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

Gives error:

texts = ["a semantic photo"] * self.num_text
TypeError: can't multiply sequence by non-int of type 'NoneType'

So I tried:

image = Image.open('image.jpg').convert('RGB')
mask = Image.open('mask.png').convert('L')
processor = AutoProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large", num_text=1)
semantic_inputs = processor(images=image, segmentation_maps=mask, task_inputs=["semantic"], return_tensors="pt")
processor.tokenizer.batch_decode(semantic_inputs.task_inputs)
model = AutoModelForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")

with torch.no_grad():
  outputs = model(**semantic_inputs)

semantic_segmentation = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

Error is then:

text_queries = nn.functional.normalize(text_queries.flatten(1), dim=-1)
AttributeError: 'NoneType' object has no attribute 'flatten'

@NielsRogge @praeclarumjj3 does this help? Perhaps I'm missing something from the docs?

NielsRogge commented 10 months ago

A notebook has now been uploaded! https://github.com/NielsRogge/Transformers-Tutorials/blob/master/OneFormer/Fine_tune_OneFormer_for_semantic_segmentation.ipynb.

Thanks for pinging me on this

werner-rammer commented 10 months ago

@NielsRogge - thanks a lot for providing a tutorial! Will try ASAP!

nickponline commented 10 months ago

This seems to work, although doesn't seen like you can calculate the loss if is_training=False. Is there a way to calculate it for example for validation loss?

nickponline commented 10 months ago

Actually nevermind you can, validation is still training :) Can close!

NielsRogge / Transformers-Tutorials

Fine-tuning Oneformer #365