The inference code is not functioning correctly.

hyunghoon-kim commented 8 months ago

I have organized the DIS-5K folder tree structure as you requested:

data
├── big
│   ├── test
│   └── val
├── dis
│   ├── DIS-TE1
│   ├── DIS-TE2
│   ├── DIS-TE3
│   ├── DIS-TE4
│   ├── DIS-TR
│   └── DIS-VD

However, when running the inference code, I encounter the following error:

/home/khh/miniconda3/envs/segrefiner/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
Traceback (most recent call last):
  File "/home/khh/miniconda3/envs/segrefiner/lib/python3.8/site-packages/mmcv/utils/registry.py", line 69, in build_from_cfg
    return obj_cls(**args)
  File "/home/khh/workspace/SegRefiner/mmdet/datasets/dis.py", line 30, in __init__
    self.load_data()
  File "/home/khh/workspace/SegRefiner/mmdet/datasets/dis.py", line 49, in load_data
    all_files = os.listdir(os.path.join(self.data_root, coarse_dir))
FileNotFoundError: [Errno 2] No such file or directory: 'data/dis/coarse/isnet/DIS-TE1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/test.py", line 237, in <module>
    main()
  File "tools/test.py", line 184, in main
    dataset = build_dataset(cfg.data.test)
  File "/home/khh/workspace/SegRefiner/mmdet/datasets/builder.py", line 82, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/home/khh/miniconda3/envs/segrefiner/lib/python3.8/site-packages/mmcv/utils/registry.py", line 72, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
FileNotFoundError: DISDataset: [Errno 2] No such file or directory: 'data/dis/coarse/isnet/DIS-TE1'

Could you please help me identify the root cause of this issue?

MengyuWang826 commented 8 months ago

I have organized the DIS-5K folder tree structure as you requested:

data ├── big │ ├── test │ └── val ├── dis │ ├── DIS-TE1 │ ├── DIS-TE2 │ ├── DIS-TE3 │ ├── DIS-TE4 │ ├── DIS-TR │ └── DIS-VD However, when running the inference code, I encounter the following error:

/home/khh/miniconda3/envs/segrefiner/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( Traceback (most recent call last): File "/home/khh/miniconda3/envs/segrefiner/lib/python3.8/site-packages/mmcv/utils/registry.py", line 69, in build_from_cfg return obj_cls(**args) File "/home/khh/workspace/SegRefiner/mmdet/datasets/dis.py", line 30, in __init__ self.load_data() File "/home/khh/workspace/SegRefiner/mmdet/datasets/dis.py", line 49, in load_data all_files = os.listdir(os.path.join(self.data_root, coarse_dir)) FileNotFoundError: [Errno 2] No such file or directory: 'data/dis/coarse/isnet/DIS-TE1' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "tools/test.py", line 237, in main() File "tools/test.py", line 184, in main dataset = build_dataset(cfg.data.test) File "/home/khh/workspace/SegRefiner/mmdet/datasets/builder.py", line 82, in build_dataset dataset = build_from_cfg(cfg, DATASETS, default_args) File "/home/khh/miniconda3/envs/segrefiner/lib/python3.8/site-packages/mmcv/utils/registry.py", line 72, in build_from_cfg raise type(e)(f'{obj_cls.**name**}: {e}') FileNotFoundError: DISDataset: [Errno 2] No such file or directory: 'data/dis/coarse/isnet/DIS-TE1'

Could you please help me identify the root cause of this issue?

The inference process requires the prediction of the previous models as the input coarse masks. For the DIS dataset, you can download the corresponding files [here](https://drive.google.com/file/d/1PoI4R-thDYhAjqOaCwyXqvAaZFEJxWnT/view). The folder tree in README.md only displays the structure of the original dataset, and for the coarse masks, the corresponding structure is as follows:

data
└── dis
    ├── DIS-TE1
    ├── DIS-TE2
    ├── DIS-TE3
    ├── DIS-TE4
    ├── DIS-TR
    ├── DIS-VD
    └── coarse
        ├── model_1_name
        │   ├── DIS-TE1
        │   ├── DIS-TE2
        │   ├── DIS-TE3
        │   └── DIS-TE4
        └── model_2_name
            ├── DIS-TE1
            ├── DIS-TE2
            ├── DIS-TE3
            └── DIS-TE4

hyunghoon-kim commented 8 months ago

Using the completed checkpoint trained with DIS-5K and ThinObject in the structure you mentioned, can we expect to successfully verify the output? I've been getting only black images as output, and I suspect there might be an issue with the training. Thank you for your valuable research efforts!

hyunghoon-kim commented 8 months ago

I used the provided config to train on ThinObject and DIS-5K, and when I performed inference on DIS-TE1 using your given data, I only get black predictions. Have I done something wrong?

MengyuWang826 commented 8 months ago

I used the provided config to train on ThinObject and DIS-5K, and when I performed inference on DIS-TE1 using your given data, I only get black predictions. Have I done something wrong?

Thank you for your feedback. I will make time to test the training code for HR-SegRefiner and reply your questions. In the meantime, could you please paste your detailed settings for the training (such as the number of GPUs, batch size, etc.) and the training process log here?

hyunghoon-kim commented 8 months ago

I used the provided config to train on ThinObject and DIS-5K, and when I performed inference on DIS-TE1 using your given data, I only get black predictions. Have I done something wrong?

Thank you for your feedback. I will make time to test the training code for HR-SegRefiner and reply your questions. In the meantime, could you please paste your detailed settings for the training (such as the number of GPUs, batch size, etc.) and the training process log here?

Thank you for your cooperation. I am very curious about the performance of your model. I used the config and commands as written in the README.md, and I have one GPU. I am attaching the log and config files from the training. 20240112_095339.log

checkpoint_config = dict(
    interval=5000, by_epoch=False, save_last=True, max_keep_ckpts=20)
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 5000)]
opencv_num_threads = 0
mp_start_method = 'fork'
auto_scale_lr = dict(enable=False, base_batch_size=16)
object_size = 256
task = 'instance'
model = dict(
    type='SegRefiner',
    task='instance',
    step=6,
    denoise_model=dict(
        type='DenoiseUNet',
        in_channels=4,
        out_channels=1,
        model_channels=128,
        num_res_blocks=2,
        num_heads=4,
        num_heads_upsample=-1,
        attention_strides=(16, 32),
        learn_time_embd=True,
        channel_mult=(1, 1, 2, 2, 4, 4),
        dropout=0.0),
    diffusion_cfg=dict(
        betas=dict(type='linear', start=0.8, stop=0, num_timesteps=6),
        diff_iter=False),
    test_cfg=dict())
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='LoadAnnotations',
        with_bbox=False,
        with_label=False,
        with_mask=True),
    dict(type='LoadPatchData', object_size=256, patch_size=256),
    dict(type='Resize', img_scale=(256, 256), keep_ratio=False),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=[
            'object_img', 'object_gt_masks', 'object_coarse_masks',
            'patch_img', 'patch_gt_masks', 'patch_coarse_masks'
        ])
]
dataset_type = 'HRCollectionDataset'
img_root = '/share/project/datasets/MSCOCO/coco2017/'
ann_root = '/share/project/datasets/LVIS/'
train_dataloader = dict(samples_per_gpu=1, workers_per_gpu=1)
data = dict(
    train=dict(
        type='HRCollectionDataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='LoadAnnotations',
                with_bbox=False,
                with_label=False,
                with_mask=True),
            dict(type='LoadPatchData', object_size=256, patch_size=256),
            dict(type='Resize', img_scale=(256, 256), keep_ratio=False),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=[
                    'object_img', 'object_gt_masks', 'object_coarse_masks',
                    'patch_img', 'patch_gt_masks', 'patch_coarse_masks'
                ])
        ],
        data_root='data/',
        collection_datasets=['thin', 'dis'],
        collection_json='data/collection_hr.json'),
    train_dataloader=dict(samples_per_gpu=1, workers_per_gpu=1),
    val=dict(),
    test=dict())
optimizer = dict(
    type='AdamW', lr=0.0004, weight_decay=0, eps=1e-08, betas=(0.9, 0.999))
optimizer_config = dict(grad_clip=None)
max_iters = 120000
runner = dict(type='IterBasedRunner', max_iters=120000)
lr_config = dict(
    policy='step',
    gamma=0.5,
    by_epoch=False,
    step=[80000, 100000],
    warmup='linear',
    warmup_by_epoch=False,
    warmup_ratio=1.0,
    warmup_iters=10)
interval = 5000
data_root = 'data/'
work_dir = './work_dirs/segrefiner_hr'
auto_resume = False
gpu_ids = [0]

MengyuWang826 commented 8 months ago

@hyunghoon-kim Sorry for the delayed response. This is an issue of training instability caused by a too-small batch size. You can try reducing the learning rate to help stabilize training. For detailed settings, you can refer to the information provided in #4

hyunghoon-kim commented 8 months ago

@MengyuWang826 Thank you for your guidance. I used two GPUs, but I couldn't reduce the batch size at 512x512 size, so I changed the learning rate to 0.0001, and it worked well

Furthermore, I would like to hear your opinion. I would like the output to be in gray type rather than a binary map. I'm not sure if it's possible, but I'd like to try modifying the loss function to see if it's achievable. Where can the loss function be modified?

MengyuWang826 commented 8 months ago

@MengyuWang826 Thank you for your guidance. I used two GPUs, but I couldn't reduce the batch size at 512x512 size, so I changed the learning rate to 0.0001, and it worked well

Furthermore, I would like to hear your opinion. I would like the output to be in gray type rather than a binary map. I'm not sure if it's possible, but I'd like to try modifying the loss function to see if it's achievable. Where can the loss function be modified?

The 2 loss functions used in the experiments are at mmdet/models/losses/cross_entropy_loss.py and mmdet/models/losses/textrue_l1_loss.py.

MengyuWang826 / SegRefiner

The inference code is not functioning correctly. #5