Closed stephenyan1231 closed 4 months ago
This pull request was exported from Phabricator. Differential Revision: D54665936
This pull request was exported from Phabricator. Differential Revision: D54665936
This pull request has been merged in facebookresearch/d2go@1216c22534ca91cb9b33e0aa92b32d9509e3c012.
Summary: In Mask2Former RC4 training, we need to use a particular weighted category training sampler where
DATALOADER.SAMPLER_TRAIN = "WeightedCategoryTrainingSampler"
.Also there are multiple datasets are used, and the set of each one's categories are not exactly identical. Some datasets have more categories (e.g. Exo-body) than other datasets that do not have exobody annotations.
Also we use category filtering by setting
D2GO_DATA.DATASETS.TRAIN_CATEGORIES
to a subset of full categories.In this setup, currently D2GO will complain metadata.thing_classes is NOT consistency across datasets (https://fburl.com/code/k8xbvyfd).
The reason is when category filtering is used, D2GO writes a temporary dataset json file (https://fburl.com/code/slb5z6mc). And this tmp json file will be loaded when we get the dataset dicts from DatasetCatalog (https://fburl.com/code/5k4ynyhc). Meanwhile, metadata in MetadataCatalog for this category-filtered dataset is also updated based on categories stored in this tmp file.
Therefore, we must ensure categories stored in the tmp file is consistent between multiple category-filtered datasets.
In this diff, we update the logic of writing such tmp dataset json file.
Differential Revision: D54665936
Privacy Context Container: L1243674