ucalyptus commented 4 years ago

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.

The COCO dataset is formatted in JSON and is a collection of “info”, “licenses”, “images”, “annotations”, “categories” (in most cases), and “segment info” (in one case).

ucalyptus commented 4 years ago

{ "info": {...}, "licenses": [...], "images": [...], "annotations": [...], "categories": [...], <-- Not in Captions annotations "segment_info": [...] <-- Only in Panoptic annotations }

ucalyptus commented 4 years ago

INFO

The “info” section contains high level information about the dataset. If you are creating your own dataset, you can fill in whatever is appropriate.

"info": { "description": "COCO 2017 Dataset", "url": "http://cocodataset.org", "version": "1.0", "year": 2017, "contributor": "COCO Consortium", "date_created": "2017/09/01" }

ucalyptus commented 4 years ago

LICENSES

The “licenses” section contains a list of image licenses that apply to images in the dataset. If you are sharing or selling your dataset, you should make sure your licenses are correctly specified and that you are not infringing on copyright.

"licenses": [ { "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/", "id": 1, "name": "Attribution-NonCommercial-ShareAlike License" }, { "url": "http://creativecommons.org/licenses/by-nc/2.0/", "id": 2, "name": "Attribution-NonCommercial License" }, ... ]

ucalyptus commented 4 years ago

IMAGES

The “images” section contains the complete list of images in your dataset. There are no labels, bounding boxes, or segmentations specified in this part, it's simply a list of images and information about each one. Note that coco_url, flickr_url, and date_captured are just for reference. Your deep learning application probably will only need the file_name.

Note that image ids need to be unique (among other images), but they do not necessarily need to match the file name (unless the deep learning code you are using makes an assumption that they’ll be the same… developers are lazy, it wouldn’t surprise me).

"images": [ { "license": 4, "file_name": "000000397133.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg", "height": 427, "width": 640, "date_captured": "2013-11-14 17:02:52", "flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg", "id": 397133 }, { "license": 1, "file_name": "000000037777.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg", "height": 230, "width": 352, "date_captured": "2013-11-14 20:55:31", "flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg", "id": 37777 }, ... ]

IEM-Computer-Vision / Your-Labels

COCO Annotations #2

INFO

LICENSES

IMAGES