NVlabs / ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
https://arxiv.org/abs/2303.04803
Other
845 stars 45 forks source link

512x512 configuration as in ablation studies #16

Open volgachen opened 1 year ago

volgachen commented 1 year ago

Hello, could you share the $512\times512$ configuration used in the ablation study? Is there any other change other than the resolution?

I've just modify all 1024 into 512 in configs/common/data/coco_panoptic_semseg.py. It diffs like this:

--- a/configs/common/data/coco_panoptic_semseg.py
+++ b/configs/common/data/coco_panoptic_semseg.py
@@ -49,10 +49,10 @@ dataloader.train = L(build_d2_train_dataloader)(
             L(T.ResizeScale)(
                 min_scale=0.1,
                 max_scale=2.0,
-                target_height=1024,
-                target_width=1024,
+                target_height=512,
+                target_width=512,
             ),
-            L(T.FixedSizeCrop)(crop_size=(1024, 1024)),
+            L(T.FixedSizeCrop)(crop_size=(512, 512)),
         ],
         image_format="RGB",
     ),
@@ -68,7 +68,7 @@ dataloader.test = L(build_d2_test_dataloader)(
     mapper=L(DatasetMapper)(
         is_train=False,
         augmentations=[
-            L(T.ResizeShortestEdge)(short_edge_length=1024, sample_style="choice", max_size=2560),
+            L(T.ResizeShortestEdge)(short_edge_length=512, sample_style="choice", max_size=1280),
diff --git a/configs/common/models/odise_with_caption.py b/configs/common/models/odise_with_caption.py
index e2862cb..03a2bf8 100644
--- a/configs/common/models/odise_with_caption.py
+++ b/configs/common/models/odise_with_caption.py
@@ -25,7 +25,7 @@ model.backbone = L(FeatureExtractorBackbone)(
     ),
     out_features=["s2", "s3", "s4", "s5"],
     use_checkpoint=True,
-    slide_training=True,
+    slide_training=False,

I suppose $512\times512$ does not require slides so I turned it off as well. I wonder if these are consistent with your configuration.