SJTU-LuHe / TransVOD

The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
Apache License 2.0
203 stars 28 forks source link

Directions to train TransVOD on custom dataset #22

Open phanikumarmalladi opened 1 year ago

phanikumarmalladi commented 1 year ago

Please let me know how to train it on a custom dataset and the necessary structure of the dataset.

itbergl commented 1 year ago

Frist make a directory containing all frames as images, organized however you want. I would suggest something like this:

> my_dataset
   > VID_0001
      > IMG_0001.png
      > IMG_002.png
      > ...
   > VID_0002
      > ...
   > ...

I would suggest using a symlink.

Then, in the folder `data/vid/annotations/make a.jsonfile and call it whatever you want. The file should be structured like [MS COCO](https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch). For bounding box detection, this structure worked for me. There are some fields likeiscrowd`` that aren't used but I kept them in just-in-case.

    {     
        "categories": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "integer"},
                    "name": {"type": "string"},
                    "encoded_name": {"type": "string"}
                }
            }
        },
        "videos" : {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "integer"},
                    "name": {"type": "string"},
                    "vid_train_frames": {"const": []}
                }
            }
        },
        "images" : {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "file_name": {"type": "string"},
                    "id": {"type": "integer"},
                    "height": {"type": "integer"},
                    "width": {"type": "integer"},
                    "frame_id": {"type": "integer"},
                    "video_id": {"type": "integer"},
                    "is_vid_train_frame": {"const": false}, 
                }
            }
        },
        "annotations" : {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "integer"},
                    "image_id": {"type": "integer"},
                    "video_id": {"type": "integer"},
                    "instance_id": {"type": "integer"},
                    "area": {"type": "integer"},
                    "iscrowd": {"const": false}, 
                    "occluded": {"const": -1}, 
                    "generated": {"const": -1}, 
                }
            }
        }
    }

You will have to ensure unique IDs for the images, videos, and annotations. Additionally, each image should point to the location of where you put the images in the first step.

To make the model actually read the .json file for the singleframe baseline, add keys to the PATHS dictionary at the bottom of datasets/vid_single.py:

   PATHS = {
   ...
   "custom_dataset": [(root / "Data" / "DSTG", root / "annotations/zoomed" / 'custom_dataset.json')],
   }

Add a transformation to make_coco_transform(image_set):

if image_set == 'custom_dataset':
        return T.Compose([
            T.RandomHorizontalFlip(),
            T.RandomResize([600], max_size=1000),
            normalize,
        ])

And in main.py, you can change image_set to "custom_dataset" for ``dataset_train```:

    dataset_train = build_dataset(image_set="custom_dataset", args=args)

I would suggest doing this twice for your training and validation set. In the above step, you could equally do it for dataset_vid.

Also if you are working on this project long-term I would suggest looking into the command line args so you aren't hardcoding main.py.

yx-withStars commented 1 year ago

my_dataset VID_0001 IMG_0001.png IMG_002.png ... VID_0002 ... ... I want know where to put my_dataset ,thank you!

mayuresh12345 commented 12 months ago

Hi @itbergl , could you please share the script used for generating the json annotation file?