Closed Robotatron closed 1 year ago
Hi @Robotatron, please find the answers to your questions below:
dataset_mappers
in this repo expect the COCO dataset in a panoptic format.You need to pay attention to two files here:
You encounter the UnboundLocalError: local variable 'pan_seg_gt' referenced before assignment
error because your dataset's metadata does not have a pan_seg_fine_name
key: https://github.com/SHI-Labs/OneFormer/blob/33ebb56ed34f970a30ae103e786c0cb64c653d9a/oneformer/data/dataset_mappers/coco_unified_new_baseline_dataset_mapper.py#L306
So, because you are using a custom dataset for instance segmentation, you should ensure the following two steps:
It's pretty easy to ensure these, and it should take you only a short time. Remember, because you only care about instance segmentation, you need to define the thing_classes correctly in your dataset_registry
and everything will work.
@praeclarumjj3 Thanks for the answer. The dataset mapping is a new thing to me, please have patience with me :)
- Your dataset's metadata has all the necessary attributes. Create a new dataset registry file if necessary.
I am registering the dataset with as I did with Mask2Former with detectron2.data.datasets.register_coco_instances
, is it enough or should I use your register_coco_panoptic_annos_sem_seg()
from register_coco_panoptic_annos_semseg.py
? Please see the image below under "details".
- The attributes align with the ones used in the dataset mapper. Write a new dataset mapper.
Are you talking about these attributes?
dataset_dict["sem_seg"] = torch.from_numpy(sem_seg).long()
dataset_dict["instances"] = instances
dataset_dict["orig_shape"] = image_shape
dataset_dict["task"] = task
dataset_dict["text"] = text
dataset_dict["thing_ids"] = self.things
If so, what do I put under "sem_seg" since I dont have semantic segmentation labels, just an empty list?
I think I am close, I've tried using the dataset mapper from Mask2Former :
I've modified the Mask2Former dataset mapper like this (mainly at the end when setting the attributes of the dataset_dict
:
It almost works, the model gets initialized, the dataloader provides images:
But when starting training it break:
Full error:
Hi @Robotatron, you can refer to the following code to understand how to set the text
attribute inside the dataset dictionary: https://github.com/SHI-Labs/OneFormer/blob/1780582ccfd29296d8ce1027e205a8dfa46746d4/oneformer/data/dataset_mappers/coco_unified_new_baseline_dataset_mapper.py#L197-L225
All you need to do is to loop through the class ids and collect the corresponding class names.
You do not need to worry about the sem_seg
attribute as it's not required during the training. So you can ignore that.
You encounter the RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
error because you are probably trying to start distributed training on a single GPU. This issue might be helpful.
Thanks for the reply @praeclarumjj3!
I've looked at your linked issue (https://github.com/facebookresearch/detectron2/issues/3972) , they recommend to replace "SyncBN" batch norm with a simple "BN". However it seems OneFormer does not use "SyncBN", it uses group norm "GN".
I tried replacing cfg.MODEL.SEM_SEG_HEAD.NORM = GN
with "BN" but still getting the same RuntimeError: Default process group has not been initialized
Have you ever tried training OneFormer on a single machine with a single GPU or do you know if it is supported at all?
Hi @Robotatron, after taking a closer look at your error, I believe the error corresponds to the contrastive_loss
definition. While experimenting, we never tried training with a single GPU, so we never encountered this error. The error corresponds to the fact that we use dist
in our contrastive_loss
definition, which will throw an error with a single GPU.
To train on a single GPU, please replace the contrastive loss method with the code below(with a check if the distributed process group has been initialized or not). I have also updated the main
branch code, so you can directly pull and run the code. Thanks for bringing this to my attention.
def loss_contrastive(self, outputs, targets, indices, num_masks):
assert "contrastive_logits" in outputs
assert "texts" in outputs
image_x = outputs["contrastive_logits"].float()
batch_size = image_x.shape[0]
# get label globally
if is_dist_avail_and_initialized():
labels = torch.arange(batch_size, dtype=torch.long, device=image_x.device) + batch_size * dist.get_rank()
else:
labels = torch.arange(batch_size, dtype=torch.long, device=image_x.device)
text_x = outputs["texts"]
# [B, C]
image_x = F.normalize(image_x.flatten(1), dim=-1)
text_x = F.normalize(text_x.flatten(1), dim=-1)
if is_dist_avail_and_initialized():
logits_per_img = image_x @ dist_collect(text_x).t()
logits_per_text = text_x @ dist_collect(image_x).t()
else:
logits_per_img = image_x @ text_x.t()
logits_per_text = text_x @ image_x.t()
logit_scale = torch.clamp(self.logit_scale.exp(), max=100)
loss_img = self.cross_entropy(logits_per_img * logit_scale, labels)
loss_text = self.cross_entropy(logits_per_text * logit_scale, labels)
loss_contrastive = loss_img + loss_text
losses = {"loss_contrastive": loss_contrastive}
return losses
@praeclarumjj3 Thanks for updating the repo, the training now works on a single GPU <3
Last question with regards to the "text" attribute, you said earlier:
Hi @Robotatron, you can refer to the following code to understand how to set the
text
attribute inside the dataset dictionary:All you need to do is to loop through the class ids and collect the corresponding class names.
I am not really sure what the purpose of this is, it seems to be a list of string messages of the format "a photo with a {cls_name}"
I took your code for setting up the "texts" attribute and tried to adjust it for the instance segmentation Mask2Former dataset mapper. I am stuck at the following two lines in your code (196 and 197): https://github.com/SHI-Labs/OneFormer/blob/main/oneformer/data/dataset_mappers/coco_unified_new_baseline_dataset_mapper.py#L196
There is no segments_info
in my instance segmentation COCO dataset_dict, so I've used annos
from the Mask2Former dataset mapper:
annos = [
utils.transform_instance_annotations(obj, transforms, image.shape[:2])
for obj in dataset_dict.pop("annotations")
if obj.get("iscrowd", 0) == 0
]
But then I was not sure what mask to use. I don't have pan_seg_gt
variable in the instance segmentation dataset mapper and hence the if statement (if not np.all(mask == False):
) would not work, so that I deleted those two lines:
mask = pan_seg_gt == segment_info["id"]
if not np.all(mask == False):
and the code for setting up the "text" attribute looks like this (on the right, your original code on the left):
Can the text attribute be populated like in the image above or should I still use the "mask" variable with the if statement?
Since I don't have the segments_info
in the dataset_dict
I am unsure what to use instead of pan_seg_gt
and segment_info["id"]
. For pan_seg_gt
maybe I can use instances.gt_masks
instead? Or is pan_seg_gt
an ID of the ground truth annotation?
@Robotatron , are you able to train oneformer for instance segmentation.
@Robotatron , are you able to train oneformer for instance segmentation.
Yes it works and trains, maybe I should submit a PR with a modified Mask2Former data mapper for instance segmentation if @praeclarumjj3 is OK with it
Hi @Robotatron, thanks for the offer. As many people have been opening issues regarding the custom training, I will push some custom dataset_mappers
for everyone's reference with documentation.
Hi @Robotatron, yes. we use dataset_dict["text"]
during training to obtain the text_queries
which are used while calculating the query-text contrastive loss. I request you to refer to Fig. 3 and Sec 3.1 in our paper for more details.
Also, you may use the following script to train OneFormer on a custom instance segmentation dataset. Remember you will need to register your dataset's metadata with the class names.
import copy
import logging
import numpy as np
import torch
from detectron2.data import MetadataCatalog
from detectron2.config import configurable
from detectron2.data import detection_utils as utils
from detectron2.data import transforms as T
from oneformer.data.tokenizer import SimpleTokenizer, Tokenize
from pycocotools import mask as coco_mask
__all__ = ["InstanceCOCOCustomNewBaselineDatasetMapper"]
def convert_coco_poly_to_mask(segmentations, height, width):
masks = []
for polygons in segmentations:
rles = coco_mask.frPyObjects(polygons, height, width)
mask = coco_mask.decode(rles)
if len(mask.shape) < 3:
mask = mask[..., None]
mask = torch.as_tensor(mask, dtype=torch.uint8)
mask = mask.any(dim=2)
masks.append(mask)
if masks:
masks = torch.stack(masks, dim=0)
else:
masks = torch.zeros((0, height, width), dtype=torch.uint8)
return masks
def build_transform_gen(cfg, is_train):
"""
Create a list of default :class:`Augmentation` from config.
Now it includes resizing and flipping.
Returns:
list[Augmentation]
"""
assert is_train, "Only support training augmentation"
image_size = cfg.INPUT.IMAGE_SIZE
min_scale = cfg.INPUT.MIN_SCALE
max_scale = cfg.INPUT.MAX_SCALE
augmentation = []
if cfg.INPUT.RANDOM_FLIP != "none":
augmentation.append(
T.RandomFlip(
horizontal=cfg.INPUT.RANDOM_FLIP == "horizontal",
vertical=cfg.INPUT.RANDOM_FLIP == "vertical",
)
)
augmentation.extend([
T.ResizeScale(
min_scale=min_scale, max_scale=max_scale, target_height=image_size, target_width=image_size
),
T.FixedSizeCrop(crop_size=(image_size, image_size)),
])
return augmentation
# This is specifically designed for the COCO Instance Segmentation dataset.
class InstanceCOCOCustomNewBaselineDatasetMapper:
"""
A callable which takes a dataset dict in Detectron2 Dataset format,
and map it into a format used by OneFormer for custom instance segmentation using COCO format.
The callable currently does the following:
1. Read the image from "file_name"
2. Applies geometric transforms to the image and annotation
3. Find and applies suitable cropping to the image and annotation
4. Prepare image and annotation to Tensors
"""
@configurable
def __init__(
self,
is_train=True,
*,
num_queries,
tfm_gens,
meta,
image_format,
max_seq_len,
task_seq_len,
):
"""
NOTE: this interface is experimental.
Args:
is_train: for training or inference
augmentations: a list of augmentations or deterministic transforms to apply
crop_gen: crop augmentation
tfm_gens: data augmentation
image_format: an image format supported by :func:`detection_utils.read_image`.
"""
self.tfm_gens = tfm_gens
logging.getLogger(__name__).info(
"[InstanceCOCOCustomNewBaselineDatasetMapper] Full TransformGens used in training: {}".format(
str(self.tfm_gens)
)
)
self.img_format = image_format
self.is_train = is_train
self.meta = meta
self.ignore_label = self.meta.ignore_label
self.num_queries = num_queries
self.things = []
for k,v in self.meta.thing_dataset_id_to_contiguous_id.items():
self.things.append(v)
self.class_names = self.meta.thing_classes
self.text_tokenizer = Tokenize(SimpleTokenizer(), max_seq_len=max_seq_len)
self.task_tokenizer = Tokenize(SimpleTokenizer(), max_seq_len=task_seq_len)
@classmethod
def from_config(cls, cfg, is_train=True):
# Build augmentation
tfm_gens = build_transform_gen(cfg, is_train)
dataset_names = cfg.DATASETS.TRAIN
meta = MetadataCatalog.get(dataset_names[0])
ret = {
"is_train": is_train,
"meta": meta,
"tfm_gens": tfm_gens,
"image_format": cfg.INPUT.FORMAT,
"num_queries": cfg.MODEL.ONE_FORMER.NUM_OBJECT_QUERIES - cfg.MODEL.TEXT_ENCODER.N_CTX,
"task_seq_len": cfg.INPUT.TASK_SEQ_LEN,
"max_seq_len": cfg.INPUT.MAX_SEQ_LEN,
}
return ret
def _get_texts(self, classes, num_class_obj):
classes = list(np.array(classes))
texts = ["an instance photo"] * self.num_queries
for class_id in classes:
cls_name = self.class_names[class_id]
num_class_obj[cls_name] += 1
num = 0
for i, cls_name in enumerate(self.class_names):
if num_class_obj[cls_name] > 0:
for _ in range(num_class_obj[cls_name]):
if num >= len(texts):
break
texts[num] = f"a photo with a {cls_name}"
num += 1
return texts
def __call__(self, dataset_dict):
"""
Args:
dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.
Returns:
dict: a format that builtin models in detectron2 accept
"""
dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below
image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
utils.check_image_size(dataset_dict, image)
# TODO: get padding mask
# by feeding a "segmentation mask" to the same transforms
padding_mask = np.ones(image.shape[:2])
image, transforms = T.apply_transform_gens(self.tfm_gens, image)
# the crop transformation has default padding value 0 for segmentation
padding_mask = transforms.apply_segmentation(padding_mask)
padding_mask = ~ padding_mask.astype(bool)
image_shape = image.shape[:2] # h, w
# Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
# but not efficient on large generic data structures due to the use of pickle & mp.Queue.
# Therefore it's important to use torch.Tensor.
dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
dataset_dict["padding_mask"] = torch.as_tensor(np.ascontiguousarray(padding_mask))
if not self.is_train:
# USER: Modify this if you want to keep them for some reason.
dataset_dict.pop("annotations", None)
return dataset_dict
if "annotations" in dataset_dict:
# USER: Modify this if you want to keep them for some reason.
for anno in dataset_dict["annotations"]:
anno.pop("keypoints", None)
# USER: Implement additional transformations if you have other types of data
annos = [
utils.transform_instance_annotations(obj, transforms, image_shape)
for obj in dataset_dict.pop("annotations")
if obj.get("iscrowd", 0) == 0
]
instances = utils.annotations_to_instances(annos, image_shape)
instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
# Need to filter empty instances first (due to augmentation)
instances = utils.filter_empty_instances(instances)
# Generate masks from polygon
h, w = instances.image_size
# image_size_xyxy = torch.as_tensor([w, h, w, h], dtype=torch.float)
if hasattr(instances, 'gt_masks'):
gt_masks = instances.gt_masks
gt_masks = convert_coco_poly_to_mask(gt_masks.polygons, h, w)
instances.gt_masks = gt_masks
dataset_dict["instances"] = instances
num_class_obj = {}
for name in self.class_names:
num_class_obj[name] = 0
task = "The task is instance"
text = self._get_texts(instances.gt_classes, num_class_obj)
dataset_dict["instances"] = instances
dataset_dict["orig_shape"] = image_shape
dataset_dict["task"] = task
dataset_dict["text"] = text
dataset_dict["thing_ids"] = self.things
return dataset_dict
I pushed some instructions for training with custom datasets. You may take a look if you face any more issues: https://github.com/SHI-Labs/OneFormer/tree/main/datasets/custom_datasets.
What a legend, thanks! Will check it out when I have time later, thanks!
Hi, @Robotatron Could you please help me how train the model on a custom dataset? Anything will be appreciated. Thanks!
Using my custom dataset in COCO format for instance segmentation training. Changed CFG to
Still getting an error
UnboundLocalError: local variable 'pan_seg_gt' referenced before assignment
From https://github.com/SHI-Labs/OneFormer/issues/5 and reading docs I understand I have to somehow prepare my dataset for instance segmentation training.
1) Is it correct to say OneFormer expects COCO dataset in a panoptic format? 2) If 1) is true, how do I convert my custom instance segm. COCO dataset to a panoptic format? 3) I found a script from panopticapi to convert from instance to panoptic format, but judging from the description it will merge every instance annotation in an image to a single annotation, which would defy the purpose of training an instance segmentation model. Also getting a KeyError when using that script https://github.com/cocodataset/panopticapi/issues/58 4) How do I prepare a detection COCO dataset to train for instance segmentation with OneFormer? Thanks.