facebookresearch / d2go

D2Go is a toolkit for efficient deep learning
Apache License 2.0
826 stars 197 forks source link

ensure metadata thing_classes consistency with multiple datasets and category filtering #653

Closed stephenyan1231 closed 4 months ago

stephenyan1231 commented 4 months ago

Summary: In Mask2Former RC4 training, we need to use a particular weighted category training sampler where DATALOADER.SAMPLER_TRAIN = "WeightedCategoryTrainingSampler".

Also there are multiple datasets are used, and the set of each one's categories are not exactly identical. Some datasets have more categories (e.g. Exo-body) than other datasets that do not have exobody annotations.

Also we use category filtering by setting D2GO_DATA.DATASETS.TRAIN_CATEGORIES to a subset of full categories.

In this setup, currently D2GO will complain metadata.thing_classes is NOT consistency across datasets (https://fburl.com/code/k8xbvyfd).

The reason is when category filtering is used, D2GO writes a temporary dataset json file (https://fburl.com/code/slb5z6mc). And this tmp json file will be loaded when we get the dataset dicts from DatasetCatalog (https://fburl.com/code/5k4ynyhc). Meanwhile, metadata in MetadataCatalog for this category-filtered dataset is also updated based on categories stored in this tmp file.

Therefore, we must ensure categories stored in the tmp file is consistent between multiple category-filtered datasets.

In this diff, we update the logic of writing such tmp dataset json file.

Differential Revision: D54665936

Privacy Context Container: L1243674

facebook-github-bot commented 4 months ago

This pull request was exported from Phabricator. Differential Revision: D54665936

facebook-github-bot commented 4 months ago

This pull request was exported from Phabricator. Differential Revision: D54665936

facebook-github-bot commented 4 months ago

This pull request has been merged in facebookresearch/d2go@1216c22534ca91cb9b33e0aa92b32d9509e3c012.