NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.12k stars 1.41k forks source link

Train Mask2Former on custom dataset #243

Closed lombardata closed 1 year ago

lombardata commented 1 year ago

Hi, I would like to train Mask2Former on my own dataset. I've COCO json files for training and validation annotations and a folder with all the images and I've already succesfully registered the dataset on other libraries (like Detectron2 on which Mask2Former is built) but didn't get where I can register (and then use) my own dataset in : https://github.com/NielsRogge/Transformers-Tutorials/blob/master/MaskFormer/Fine-tuning/Fine_tuning_MaskFormerForInstanceSegmentation_on_semantic_sidewalk.ipynb Does anyone have a suggestion? Thank you in advance:)

NielsRogge commented 1 year ago

Hi,

There's no notion of "registration" like in other libraries, we make it a lot easier ;)

Instead you just need to create a regular PyTorch dataset, as illustrated in the notebook that you link to. A PyTorch Dataset is a simple Python class that inherits from torch.utils.data.Dataset, and it needs to implement 2 methods: 1) getitem, to return one item of the dataset, and 2) len, to return the length of the dataset. As can be seen in the notebook, we assume that your semantic segmentation dataset consists of (image, ground truth segmentation map) pairs. Each ground truth segmentation is a PNG file storing the labels per pixel. An example of such a dataset can be seen here.

Next, one can create a corresponding PyTorch dataloader which gives you batches of the dataset.

Which task would you like to train Mask2Former for, is it instance, semantic or panoptic segmentation?

lombardata commented 1 year ago

Ok, thank you very much. I would like to train Mask2Former for the instance segmentation task. Since I already have a COCO json annotation file, I would like to keep these annotations (or convert them in ground truth segmentation maps). I've found this article : https://medium.com/fullstackai/how-to-train-an-object-detector-with-your-own-coco-dataset-in-pytorch-319e7090da5 But the Pytorch dataset they build has some differences whit the one you shown (e.g. the __getitem__ method returns img, my_annotation instead of image, segmentation_map, original_image, original_segmentation_map). Do you know a simple method that can allow me to create a dataset with the necessary characteristics to use Mask2Former? Thank you again

NielsRogge commented 1 year ago

COCO stores the instance segmentations in a polygon format, you can use this function to convert them to binary masks. This is what you need to train Mask2Former: for each image, you need a set of binary masks + their corresponding class labels.

Mask2Former's image processor internally converts the segmentation maps (PNG's) to a set of binary masks as seen here.

So I'd advise to prepare the images + targets yourself in the __getitem__ method, using the function linked above, rather than using Mask2FormerImageProcessor. This involves 2 things: 1) resizing + normalizing the image to pixel_values and 2) creating a set of binary masks and corresponding class labels (called mask_labels and class_labels).

The reason I included the original image and original segmentation map is to compute metrics like mIoU.

Let me know if you need any help! We can perhaps create an example to train a Mask2Former model starting from instance annotations in COCO format.

cc @alaradirik

lombardata commented 1 year ago

Thank you very much, I would really appreciate your help in writing an example to train a Mask2Former model starting from instance annotations in COCO format. I've sent you an email !

alaradirik commented 1 year ago

Hi @lombardata, we will be releasing an extensive tutorial on fine-tuning MaskFormer for instance segmentation very soon - most likely next week. The tutorial will cover how to preprocess your data and what the model expects in detail, and the pipeline is completely the same for Mask2Former.

@NielsRogge already gave great pointers but I'd also recommend converting COCO annotations to ground truth segmentation maps and uploading your dataset to the hub (can be set to private) to use the datasets library, which would really simplify your workflow. You can easily create and push your dataset from your local following this example.

lombardata commented 1 year ago

Ok, thank you very much @alaradirik and @NielsRogge. Wait for it! So you recommend to convert COCO annotations to segmentation maps and then segmentation maps to binary masks? I understood that @NielsRogge wanted to convert directly COCO annotations to binary masks without passing through the step of segmentation maps.

alaradirik commented 1 year ago

@lombardata for instance segmentation, I'd recommend using a format similar to the Scene Parsing dataset, which stores instance segmentation maps as RGB images. It uses the R (Red) channel to store the (HxW) instance id map, G (Green) channel to store the (HxW) category id map and set the B (Blue) channel to all zeros.

But it's completely up to you of course :)

lombardata commented 1 year ago

Thanks a lot @alaradirik. Given my unfamiliarity with annotation formats, if @NielsRogge ' invitation to create a dataset from COCO annotations is still valid, I would be very interested in creating a function that turns COCO annotations into segmentation maps. Given the extensive use of the COCO format in computer vision tasks, I think this feature would benefit many people :) Thank again for the splendid work you do for the community

NielsRogge commented 1 year ago

@lombardata maybe you can create a tiny dataset of 10 images with corresponding instance annotations in COCO format. You can use the metadata feature of ImageFolder to turn it into a HuggingFace dataset. We can create a tutorial based on that.

Let me know whether you need any help.

lombardata commented 1 year ago

Perfect @NielsRogge , I can make a function to export COCO annotations in the following format :

{"file_name": "0001.png", "objects": {"bbox": [[302.0, 109.0, 73.0, 52.0]], "categories": [0]}}
{"file_name": "0002.png", "objects": {"bbox": [[810.0, 100.0, 57.0, 28.0]], "categories": [1]}}
{"file_name": "0003.png", "objects": {"bbox": [[160.0, 31.0, 248.0, 616.0], [741.0, 68.0, 202.0, 401.0]], "categories": [2, 2]}}

and then load the annotation file through the load_dataset function, but since I'm working on a segmentation problem, I need also a segmentation field in the JSON file. Once we'll have a HuggingFace dataset, will it be sufficient to import the dataset and the rest of the tutorial will be the same as that for annotated datasets in PNG format?

NielsRogge commented 1 year ago

since I'm working on a segmentation problem, I need also a segmentation field in the JSON file.

I assumed that you have those, as instance segmentation in COCO format has a "segmentation" key in the annotations. Each annotation in COCO format looks like this:

{
        "segmentation": [[510.66,423.01,511.72,420.03,...,510.45,423.01]],
        "area": 702.1057499999998,
        "iscrowd": 0,
        "image_id": 289343,
        "bbox": [473.07,395.93,38.65,28.67],
        "category_id": 18,
        "id": 1768
    }
lombardata commented 1 year ago

I assumed that you have those

Yes of course I have it, I meant that in the metadata feature of ImageFolder, there is no segmentation field, but only the bbox one

NielsRogge commented 1 year ago

The metadata feature of ImageFolder supports any custom key, the "object detection" section in the docs is just an example. As long as you have a "file_name" key (to be able to link the metadata to a corresponding image in your folder), it's fine.

lombardata commented 1 year ago

Ok. So how do you propose to structure the annotation file? Something like : {"file_name": "0001.png", "objects": {"bbox": [[302.0, 109.0, 73.0, 52.0]], "segmentation": [[1134,454,1134,454,1132,453,1131,453,1131,453,1130,452,1130,452,1129,452,1128,452,1128,452,1128,452,1127,451,1127,451,1127,450,1121,449,1121,449,1121,449,1120,449,1120,449,1119,449,1118,448,1118,448,11185,462,1115,462,1114,461,1114,461,1114,461,1114,460,1113,459,1113,459,1113,459,1113,458,1113,457,1113,457,1113,456,1113,455,1113,455,1113,455,1113,454,1112,454,1112,453,1112,453,1112,453,1112,453,1112,452,1112,452,1112,452,1112,452,1112,452,1113,452,1113,452,1113,452,1114,452,1114,451,1114,451,1114,451,1114,451,1114,451,1114,451,1134,454]],"area": 3238.036800000004,"iscrowd": 0,"categories": [0]}} ?

NielsRogge commented 1 year ago

Yes, indeed!

lombardata commented 1 year ago

Hi @NielsRogge, I've prepared the dataset, but still have some problems in load it with the load_dataset function (do not have informations of metadata.jsonl). Is this really the structure of metadata.jsonl : {"file_name": "0001.png", "additional_feature": "This is a first value of a text feature you added to your images"} {"file_name": "0002.png", "additional_feature": "This is a second value of a text feature you added to your images"} {"file_name": "0003.png", "additional_feature": "This is a third value of a text feature you added to your images"} ? or objects in the json file are comma separated? I get the following error : ArrowInvalid: JSON parse error: Missing a name for object member. in row 0

lombardata commented 1 year ago

Ok, I needed to dump each line of the JSON line file before write it down : like here my fault

lombardata commented 1 year ago

@lombardata maybe you can create a tiny dataset of 10 images with corresponding instance annotations in COCO format. You can use the metadata feature of ImageFolder to turn it into a HuggingFace dataset. We can create a tutorial based on that.

Let me know whether you need any help.

Hi @NielsRogge , I've successfully created a HuggingFace dataset. How do you want to proceed in order to create a tutorial based on that? Thank you in advance and have a good day!

NielsRogge commented 1 year ago

If the dataset can be public, could you push it to the hub? Just (assuming you have run huggingface-cli login in a terminal):

dataset.push_to_hub("username/name_of_your_dataset")

Then you can reload it using

from datasets import load_dataset

dataset = load_dataset("username/name_of_your_dataset")
lombardata commented 1 year ago

If the dataset can be public, could you push it to the hub? Just (assuming you have run huggingface-cli login in a terminal):

dataset.push_to_hub("username/name_of_your_dataset")

Then you can reload it using

from datasets import load_dataset

dataset = load_dataset("username/name_of_your_dataset")

Here it is : https://huggingface.co/datasets/lombardata/data_2017/viewer/lombardata--data_2017 I've published only 7 images because at the moment the dataset is still not public, but we are working on a datapaper in order to publish it. Is this the correct dataset format ? Can we work on it ? Have a good day @NielsRogge and thanks again

lombardata commented 1 year ago

Hi all, Allow me to ask the question again : any news about the dataset format @NielsRogge ? And what about the extensive tutorial on fine-tuning MaskFormer for instance segmentation @alaradirik ? Have a good day !

alaradirik commented 1 year ago

Hi @lombardata, the blog post + tutorial is ready but it is in collaboration with PyImageSearch, I will let you know the release date shortly

lombardata commented 1 year ago

PyImageSearch

Thank you very much :)

alaradirik commented 1 year ago

Hey @lombardata, the blog post + tutorial is scheduled for March 13th on PyImageSearch, you will have a wait a bit unfortunately

alaradirik commented 1 year ago

@lombardata the tutorial to fine-tune MaskFormer for instance segmentation is over here: https://pyimagesearch.com/2023/03/13/train-a-maskformer-segmentation-model-with-hugging-face-transformers/

You can simply swap the Config, ImageProcessor and Model class names with the corresponding Mask2Former classes and use the same code :)

lombardata commented 1 year ago

@lombardata the tutorial to fine-tune MaskFormer for instance segmentation is over here: https://pyimagesearch.com/2023/03/13/train-a-maskformer-segmentation-model-with-hugging-face-transformers/

You can simply swap the Config, ImageProcessor and Model class names with the corresponding Mask2Former classes and use the same code :)

Hi @alaradirik I was trying to reproduce the tutorial you have published (thank you very much, it has exhaustive explanations!) but when I try to import the model :

model = MaskFormerForInstanceSegmentation(config)

I run into the error :

AttributeError Traceback (most recent call last) [/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/tuto.ipynb](https://file+.vscode-resource.vscode-cdn.net/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/tuto.ipynb) Cell 9 in () [1](vscode-notebook-cell:/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/tuto.ipynb#X16sZmlsZQ%3D%3D?line=0) # Use the config object to initialize a MaskFormer model with randomized weights ----> [2](vscode-notebook-cell:/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/tuto.ipynb#X16sZmlsZQ%3D%3D?line=1) model = MaskFormerForInstanceSegmentation(config) [3](vscode-notebook-cell:/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/tuto.ipynb#X16sZmlsZQ%3D%3D?line=2) # Replace the randomly initialized model with the pre-trained model weights [4](vscode-notebook-cell:/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/tuto.ipynb#X16sZmlsZQ%3D%3D?line=3) base_model = MaskFormerModel.from_pretrained(model_name) File [~/anaconda3/envs/mask2former/lib/python3.8/site-packages/transformers/models/maskformer/modeling_maskformer.py:1637](https://file+.vscode-resource.vscode-cdn.net/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/~/anaconda3/envs/mask2former/lib/python3.8/site-packages/transformers/models/maskformer/modeling_maskformer.py:1637), in MaskFormerForInstanceSegmentation.__init__(self, config) 1635 def __init__(self, config: MaskFormerConfig): 1636 super().__init__(config) -> 1637 self.model = MaskFormerModel(config) 1638 hidden_size = config.decoder_config.hidden_size 1639 # + 1 because we add the "null" class File [~/anaconda3/envs/mask2former/lib/python3.8/site-packages/transformers/models/maskformer/modeling_maskformer.py:1545](https://file+.vscode-resource.vscode-cdn.net/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/~/anaconda3/envs/mask2former/lib/python3.8/site-packages/transformers/models/maskformer/modeling_maskformer.py:1545), in MaskFormerModel.__init__(self, config) 1540 self.pixel_level_module = MaskFormerPixelLevelModule(config) 1541 self.transformer_module = MaskFormerTransformerModule( 1542 in_features=self.pixel_level_module.encoder.channels[-1], config=config 1543 ) -> 1545 self.post_init() File [~/anaconda3/envs/mask2former/lib/python3.8/site-packages/transformers/modeling_utils.py:1090](https://file+.vscode-resource.vscode-cdn.net/home/mcontini/Desktop/PhD/Git_projects/tuto_pyimagesearch_huggingface/~/anaconda3/envs/mask2former/lib/python3.8/site-packages/transformers/modeling_utils.py:1090), in PreTrainedModel.post_init(self) 1085 def post_init(self): 1086 """ ... 946 return modules[name] --> 947 raise AttributeError("'{}' object has no attribute '{}'".format( 948 type(self).__name__, name)) AttributeError: 'MaskFormerFPNConvLayer' object has no attribute 'get_submodule'

Do you have any idea of how to fix it ? My config file is the same you have defined in the turorial :

model_name = "facebook/maskformer-swin-base-ade" config = MaskFormerConfig.from_pretrained(model_name) print("[INFO] displaying the MaskFormer configuration...") print(config)

that gives :

MaskFormerConfig { "_commit_hash": "b569351e060953d37fd7dfb8b16ab83c360a13d6", "architectures": [ "MaskFormerForInstanceSegmentation" ], "backbone_config": { "_name_or_path": "", "add_cross_attention": false, "architectures": null, "attention_probs_dropout_prob": 0.0, "bad_words_ids": null, "begin_suppress_tokens": null, "bos_token_id": null, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "depths": [ 2, 2, 18, 2 ], "diversity_penalty": 0.0, "do_sample": false, "drop_path_rate": 0.3, "early_stopping": false, "embed_dim": 128, "encoder_no_repeat_ngram_size": 0, "encoder_stride": 32, "eos_token_id": null, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.0, "hidden_size": 1024, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "image_size": 384, "in_channels": 3, "initializer_range": 0.02, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-05, "length_penalty": 1.0, "max_length": 20, "min_length": 0, "mlp_ratio": 4.0, "model_type": "swin", "no_repeat_ngram_size": 0, "num_beam_groups": 1, "num_beams": 1, "num_channels": 3, "num_heads": [ 4, 8, 16, 32 ], "num_layers": 4, "num_return_sequences": 1, "out_features": [ "stage4" ], "out_indices": [ 4 ], "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": null, "patch_size": 4, "path_norm": true, "prefix": null, "pretrain_img_size": 384, "problem_type": null, "pruned_heads": {}, "qkv_bias": true, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "sep_token_id": null, "stage_names": [ "stem", "stage1", "stage2", "stage3", "stage4" ], "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": null, "torchscript": false, "transformers_version": "4.30.0.dev0", "typical_p": 1.0, "use_absolute_embeddings": false, "use_bfloat16": false, "window_size": 12 }, "ce_weight": 1.0, "cross_entropy_weight": 1.0, "decoder_config": { "_commit_hash": null, "_name_or_path": "", "activation_dropout": 0.0, "activation_function": "relu", "add_cross_attention": false, "architectures": null, "attention_dropout": 0.0, "auxiliary_loss": false, "backbone": "resnet50", "backbone_config": null, "bad_words_ids": null, "bbox_cost": 5, "bbox_loss_coefficient": 5, "begin_suppress_tokens": null, "bos_token_id": null, "chunk_size_feed_forward": 0, "class_cost": 1, "cross_attention_hidden_size": null, "d_model": 256, "decoder_attention_heads": 8, "decoder_ffn_dim": 2048, "decoder_layerdrop": 0.0, "decoder_layers": 6, "decoder_start_token_id": null, "dice_loss_coefficient": 1, "dilation": false, "diversity_penalty": 0.0, "do_sample": false, "dropout": 0.1, "early_stopping": false, "encoder_attention_heads": 8, "encoder_ffn_dim": 2048, "encoder_layerdrop": 0.0, "encoder_layers": 6, "encoder_no_repeat_ngram_size": 0, "eos_coefficient": 0.1, "eos_token_id": null, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "giou_cost": 2, "giou_loss_coefficient": 2, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "init_std": 0.02, "init_xavier_std": 1.0, "is_decoder": false, "is_encoder_decoder": true, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "length_penalty": 1.0, "mask_loss_coefficient": 1, "max_length": 20, "max_position_embeddings": 1024, "min_length": 0, "model_type": "detr", "no_repeat_ngram_size": 0, "num_beam_groups": 1, "num_beams": 1, "num_channels": 3, "num_hidden_layers": 6, "num_queries": 100, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": null, "position_embedding_type": "sine", "prefix": null, "problem_type": null, "pruned_heads": {}, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "scale_embedding": false, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": null, "torchscript": false, "transformers_version": "4.17.0.dev0", "typical_p": 1.0, "use_bfloat16": false, "use_pretrained_backbone": true, "use_timm_backbone": true }, "dice_weight": 1.0, "fpn_feature_size": 256, "id2label": { "0": "bed", "1": "windowpane", "2": "cabinet", "3": "person", "4": "door", "5": "table", "6": "curtain", "7": "chair", "8": "car", "9": "painting", "10": "sofa", "11": "shelf", "12": "mirror", "13": "armchair", "14": "seat", "15": "fence", "16": "desk", "17": "wardrobe", "18": "lamp", "19": "bathtub", "20": "railing", "21": "cushion", "22": "box", "23": "column", "24": "signboard", "25": "chest of drawers", "26": "counter", "27": "sink", "28": "fireplace", "29": "refrigerator", "30": "stairs", "31": "case", "32": "pool table", "33": "pillow", "34": "screen door", "35": "bookcase", "36": "coffee table", "37": "toilet", "38": "flower", "39": "book", "40": "bench", "41": "countertop", "42": "stove", "43": "palm", "44": "kitchen island", "45": "computer", "46": "swivel chair", "47": "boat", "48": "arcade machine", "49": "bus", "50": "towel", "51": "light", "52": "truck", "53": "chandelier", "54": "awning", "55": "streetlight", "56": "booth", "57": "television receiver", "58": "airplane", "59": "apparel", "60": "pole", "61": "bannister", "62": "ottoman", "63": "bottle", "64": "van", "65": "ship", "66": "fountain", "67": "washer", "68": "plaything", "69": "stool", "70": "barrel", "71": "basket", "72": "bag", "73": "minibike", "74": "oven", "75": "ball", "76": "food", "77": "step", "78": "trade name", "79": "microwave", "80": "pot", "81": "animal", "82": "bicycle", "83": "dishwasher", "84": "screen", "85": "sculpture", "86": "hood", "87": "sconce", "88": "vase", "89": "traffic light", "90": "tray", "91": "ashcan", "92": "fan", "93": "plate", "94": "monitor", "95": "bulletin board", "96": "radiator", "97": "glass", "98": "clock", "99": "flag" }, "init_std": 0.02, "init_xavier_std": 1.0, "label2id": { "airplane": 58, "animal": 81, "apparel": 59, "arcade machine": 48, "armchair": 13, "ashcan": 91, "awning": 54, "bag": 72, "ball": 75, "bannister": 61, "barrel": 70, "basket": 71, "bathtub": 19, "bed": 0, "bench": 40, "bicycle": 82, "boat": 47, "book": 39, "bookcase": 35, "booth": 56, "bottle": 63, "box": 22, "bulletin board": 95, "bus": 49, "cabinet": 2, "car": 8, "case": 31, "chair": 7, "chandelier": 53, "chest of drawers": 25, "clock": 98, "coffee table": 36, "column": 23, "computer": 45, "counter": 26, "countertop": 41, "curtain": 6, "cushion": 21, "desk": 16, "dishwasher": 83, "door": 4, "fan": 92, "fence": 15, "fireplace": 28, "flag": 99, "flower": 38, "food": 76, "fountain": 66, "glass": 97, "hood": 86, "kitchen island": 44, "lamp": 18, "light": 51, "microwave": 79, "minibike": 73, "mirror": 12, "monitor": 94, "ottoman": 62, "oven": 74, "painting": 9, "palm": 43, "person": 3, "pillow": 33, "plate": 93, "plaything": 68, "pole": 60, "pool table": 32, "pot": 80, "radiator": 96, "railing": 20, "refrigerator": 29, "sconce": 87, "screen": 84, "screen door": 34, "sculpture": 85, "seat": 14, "shelf": 11, "ship": 65, "signboard": 24, "sink": 27, "sofa": 10, "stairs": 30, "step": 77, "stool": 69, "stove": 42, "streetlight": 55, "swivel chair": 46, "table": 5, "television receiver": 57, "toilet": 37, "towel": 50, "trade name": 78, "traffic light": 89, "tray": 90, "truck": 52, "van": 64, "vase": 88, "wardrobe": 17, "washer": 67, "windowpane": 1 }, "mask_feature_size": 256, "mask_weight": 20.0, "model_type": "maskformer", "no_object_weight": 0.1, "num_attention_heads": 8, "num_hidden_layers": 6, "num_queries": 100, "output_auxiliary_logits": null, "torch_dtype": "float32", "transformers_version": null, "use_auxiliary_loss": false }

Thank you in advance for your help and have a good day !

Robotatron commented 1 year ago

I need help converting COCO JSON annotations for instance segmentation to the Huggingface datasets format. Existing Huggingface tutorials mainly cover downloading datasets from the Hub, and I haven't found guidance on preparing custom datasets from custom COCO annotations. If there are no existing resources, I'm curious if Huggingface plans to support COCO datasets natively in the future or provide a tutorial for preparing custom datasets. Your assistance would be greatly appreciated.

Robotatron commented 1 year ago

@lombardata the tutorial to fine-tune MaskFormer for instance segmentation is over here: https://pyimagesearch.com/2023/03/13/train-a-maskformer-segmentation-model-with-hugging-face-transformers/

You can simply swap the Config, ImageProcessor and Model class names with the corresponding Mask2Former classes and use the same code :)

This tutorial unfortunately does not cover how to load a custom dataset in a COCO format, which is a very popular format in Computer Vision and many CV libraries have native support for it (Detectron2, mmDet, EasyCV, etc etc).

I think it would be very beneficial to the whole community if we had a tutorial on how to use HuggingFace vision with a custom COCO json datasets. Your tutorials for Mask2Former and MaskFormer directly download the dataset from the hub, without showing how to actually convert COCO json to the format HuggingFace supports (the weird RGB encoding)

lombardata commented 1 year ago

Hi @alaradirik and @NielsRogge, I succesfully runned the tutorial you have published on : https://pyimagesearch.com/2023/03/13/train-a-maskformer-segmentation-model-with-hugging-face-transformers/ as long as other tutorials you have done on Detr or Maskformer. The only problem, as pointed out by @Robotatron, is that all these tutorials are doing fine-tuning on datasets that are already uploaded on huggingface (Scene Parsing dataset, ade20k-panoptic-demo etc) but no informations are available on how to upload a custom dataset to Huggingface (in order to finetune the model on it of course). I checked the dataset I've uploaded some months ago : https://huggingface.co/datasets/lombardata/data_2017/viewer/lombardata--data_2017 using the ImageFolder function, but I found that there are still some problems (e.g. the label (class label) field is empty). Do you have any suggestion on how we can load a custom dataset with annotations in the standard COCO format? I have both a dataset with Instance Segmentation annotations and a dataset with Panoptic Segmentation annotations (both in the COCO format) but do not find a way to upload these datasets to Huggingface. Thank you in advance for your great help and for the work you are doing for the whole community :) !

NielsRogge commented 1 year ago

Hi @lombardata, thanks for letting me know. I've pinged the Datasets team to add support for the COCO native format.

For now, refer to this guide: https://huggingface.co/docs/datasets/image_dataset#object-detection.

I too hope this will be improved in the near future. We are working towards that goal, e.g. we're adding COCO metrics to the evaluate library and leverage it for this leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard.

Next, it should become easy to access COCO-style datasets.

daudaml commented 1 year ago

Hi @lombardata

You can use this function to convert your instance segmentation coco annotations to masks.


import os
import cv2
from pycocotools.coco import COCO
import numpy as np

def coco2masks(coco_annotation_file, images_directory, output_directory):
    coco = COCO(coco_annotation_file)
    image_ids = coco.getImgIds()

    if not os.path.exists(output_directory):
        os.makedirs(output_directory)

    for image_id in image_ids:
        img_info = coco.loadImgs(image_id)[0]
        image_filename = img_info['file_name']
        image_path = os.path.join(images_directory, image_filename)
        image = cv2.imread(image_path)

        ann_ids = coco.getAnnIds(imgIds=image_id)
        annotations = coco.loadAnns(ann_ids)

        class_map = np.zeros_like(image[:, :, 0], dtype=np.uint8)
        instance_map = np.zeros_like(image[:, :, 0], dtype=np.uint8)

        for idx, ann in enumerate(annotations):
            mask = coco.annToMask(ann)
            class_id = ann['category_id']
            instance_id = idx + 1  # Start instance IDs from 1

            class_map[mask > 0] = class_id
            instance_map[mask > 0] = instance_id

        output_filename = os.path.splitext(image_filename)[0] + '_mask.png'
        output_path = os.path.join(output_directory, output_filename)

        mask_image = np.zeros((image.shape[0], image.shape[1], 3), dtype=np.uint8)
        mask_image[:, :, 0] = class_map
        mask_image[:, :, 1] = instance_map

        cv2.imwrite(output_path, mask_image)
lombardata commented 1 year ago

Thank you very much @NielsRogge and @daudaml.Since my annotations were very particular, I wrote a custom dataloader and succesfully runned Maskformer tutorials. Waiting for COCO metrics in the evaluate library ! Thank you for your help :)

Robotatron commented 1 year ago

@daudaml Thanks for the code snippet. From what I see it creates class_map and instance_map masks, is it really necessary for Mask2Former? If I am not mistaken, @NielsRogge mentioned in another github issue that normal instance maps (black and white bit mask) + label (i.e. 1 or 5) should be enough?

@lombardata Did you have your annotations in a normal COCO format? If so it would be great if you could share your parsing / data loader script as many people have issues getting their COCO datasets into Huggingface, thanks!

lombardata commented 1 year ago

Hi @Robotatron, here you can find the code : https://github.com/lombardata/Upload_COCO_ds_on_huggingface/blob/main/upload_custom_ds_panoptic_to_hf.ipynb You will have to adapt it to your dataset of course, but the main idea is there. I have used it to upload a Panoptic Dataset to the huggingface hub and trained MaskFormer on it. Hope it helps !

ruturaj-gh commented 11 months ago

Hello , We were building brain tumor detection project for our final year and where we want to segment MRI images using mask2former, but i don't know how to do that does anyone have any idea about it?

Ahmed-G-ElTaher commented 7 months ago

Hi @daudaml,

Thank you for sharing your helpful code! I appreciate the effort you put into this experiment. While using your code, I identified a few potential issues that I was able to address:

1. Image Alignment:

The original code did not save the original images, causing misalignment with the masks when used with a dataset class from Transformers. To ensure alignment, I Saved the original images: This ensures the dataset class can correctly reference both images and their corresponding masks. Renamed images numerically: This facilitates sorting and improves clarity.

2. Color Space Conversion:

CV2 reads and writes images in BGR format. The original code did not address this, leading to potential errors in the model due to red channels being filled with 0. To resolve this, I Converted color spaces: Both on reading and writing images, I converted them to the desired format (e.g., RGB) to avoid compatibility issues.

3. Overlapping Object Handling:

The original instance map method might not accurately handle overlapping objects due to potential overwriting issues. To address this, I Reordered annotations by area: I sorted the annotations within each image based on object size, starting with the largest to ensure all objects appear in the final mask.

The Code:

import os
import cv2
from pycocotools.coco import COCO
import numpy as np

def coco2masks(coco_annotation_file, images_directory, output_mask_directory,output_images_directory):
    coco = COCO(coco_annotation_file)
    image_ids = coco.getImgIds()

    if not os.path.exists(output_mask_directory):
        os.makedirs(output_mask_directory)
    if not os.path.exists(output_images_directory):
        os.makedirs(output_images_directory)

    for image_id in image_ids:
        img_info = coco.loadImgs(image_id)[0]
        image_filename = img_info['file_name']
        image_path = os.path.join(images_directory, image_filename)
        image = cv2.imread(image_path,cv2.IMREAD_COLOR)

        ann_ids = coco.getAnnIds(imgIds=image_id)
        annotations = coco.loadAnns(ann_ids)
        sort = [j[1] for j in (sorted([(i['area'],i['id']) for i in annotations],reverse=True, key=lambda x: x[0]))]
        sorted_annotations = []
        for idsorted in sort:
            sorted_annotations.append(next(item for item in annotations if item["id"] == idsorted))

        class_map = np.zeros_like(image[:, :, 0], dtype=np.uint8)
        instance_map = np.zeros_like(image[:, :, 0], dtype=np.uint8)

        for idx, ann in enumerate(sorted_annotations):
            mask = coco.annToMask(ann)
            class_id = ann['category_id']
            instance_id = idx + 1  # Start instance IDs from 1

            class_map[mask > 0] = class_id
            instance_map[mask > 0] = instance_id

        output_filename = str(image_id)+ os.path.splitext(image_filename)[1] 
        output_mask_path = os.path.join(output_mask_directory,str(image_id)+'.mask.png')
        output_image_path = os.path.join(output_images_directory, output_filename)

        mask_image = np.zeros_like(image, dtype=np.uint8)
        mask_image[:, :, 0] = class_map
        mask_image[:, :, 1] = instance_map

        cv2.imwrite(output_mask_path, cv2.cvtColor(mask_image, cv2.COLOR_RGB2BGR));print(output_mask_path)
        cv2.imwrite(output_image_path, cv2.cvtColor(image, cv2.COLOR_RGB2BGR));print(output_image_path)
    print(image_ids)

use it as follow : coco2masks('/path/annotations.json', '/path/images', '/path/train/train_masks','/path/train/train_images')

I believe these improvements enhance the code's usability and accuracy. I hope you find them helpful! @lombardata

Also if any one want to split the annotation json file into train,validation and test files before using this method, can easily use the method in this topic --> Efficiently Splitting COCO Annotations for Model Training (Code Included): How to split the JSON annotation file in COCO format to train,validation and test json files.

jetsonwork commented 5 months ago

Hi could you please let me know how should I prepare the dataset for instance segmentation while in my dataset I have more than 255 classes?

Ahmed-G-ElTaher commented 5 months ago

Hi could you please let me know how should I prepare the dataset for instance segmentation while in my dataset I have more than 255 classes?

if you have images and want to annotate the images, then you can use CVAT or VGG Annotator or any other online tool like Roboflow.

jetsonwork commented 5 months ago

Thanks @Ahmed-G-ElTaher basically I want to convert the coco format to a mask image, similar to the shared function but for more than 256 classes.

Ahmed-G-ElTaher commented 5 months ago

@jetsonwork it's a interesting question 🤔... i will think about it... you can also search about instance segmentation coco dataset how it annotated ? ... because this dataset have 1000 class and it prepared for instance segmentation.

Also, i want to help you in any further progress... so, please connect with me in LinkedIn (my account in github bio) or send me mail : ahmed.g.eltaher@gmail.com