ajzhai / PEANUT

[ICCV 2023] PEANUT: Predicting and Navigating to Unseen Targets
https://ajzhai.github.io/PEANUT/
MIT License
42 stars 5 forks source link

Evaluation on Map Prediction #5

Closed Jaraxxus-Me closed 9 months ago

Jaraxxus-Me commented 9 months ago

Hi, Thanks for this great work! I see in the provided, saved semantic map there are train and val folders. Have you tried to use the val data to validate and test the map predictor? I tried to integrate an Evaluation Hook during training, but the results are mostly all zero IoU. I'm not sure if I am wrong. Also, I tried to run all the 2000 episodes in HM3D val set, but only 30 of them got non-zero map prediction. I am using your official model, is this expected?

ajzhai commented 9 months ago

That is not expected, the map prediction should never be all zero. Which parts of the code have you modified?

Jaraxxus-Me commented 9 months ago

Hi, thanks for the feedback! I added these two lines in collect.py and didn't modify

while not hab_env.episode_over:
    action = nav_agent.act(observations)
    if nav_agent.agent_states.target_pred.max()>0:
        flag = True
    observations = hab_env.step(action)

I did the evaluation on the 2000 episodes in hm3d-val set. There seem to be only 30 episodes that flag=True. I also used the visualizer to show the map prediction, most of which looks like this: 0-0-Vis-32 I think the value map is the original prediction (Z_t in original paper Eqn (3)) and the Dist Weight is used in Eqn (4)?

Jaraxxus-Me commented 9 months ago

I also updated prediction/train_prediction_model.py as follows:

import torch, torchvision
import torch.nn as nn
import torch.nn.functional as F
import mmseg

import mmcv
import os.path as osp
import numpy as np
from scipy.special import expit
from PIL import Image
import matplotlib.pyplot as plt

from mmseg.datasets.builder import PIPELINES
from mmseg.datasets.builder import DATASETS
from mmseg.datasets.custom import CustomDataset
from mmcv.utils import print_log
from mmseg.utils import get_root_logger
from mmseg.models.builder import LOSSES
from mmseg.models.losses.utils import weighted_loss
from mmcv import Config
from mmseg.apis import set_random_seed
from mmseg.utils import get_device
from mmseg.datasets import build_dataset
from mmseg.models import build_segmentor
from mmseg.apis import train_segmentor
from mmseg.core import intersect_and_union

NUM_TARGET_CATEGORIES = 6
NUM_EXTRA_CATEGORIES = 3

def sigmoid(x):
    return expit(x)

@PIPELINES.register_module()
class LoadMapFromFile(object):
    """
    Load semantic maps from file.
    Requires key "img_info" (a dict that must contain the key "filename"). 
    """

    def __init__(self,
                 to_float32=False,
                 file_client_args=dict(backend='disk'),
                 imdecode_backend='np'):
        self.to_float32 = to_float32
        self.file_client_args = file_client_args.copy()
        self.file_client = None
        self.imdecode_backend = imdecode_backend

    def __call__(self, results):
        """
        Call functions to load image and get image meta information.
        Args:
            results (dict): Result dict from :obj:`mmseg.CustomDataset`.
        Returns:
            dict: The dict contains loaded image and meta information.
        """

        if self.file_client is None:
            self.file_client = mmcv.FileClient(**self.file_client_args)

        if results.get('img_prefix') is not None:
            filename = osp.join(results['img_prefix'],
                                results['img_info']['filename'])
        else:
            filename = results['img_info']['filename']
        maps = np.load(filename)
        if filename[-1] == 'z':
            maps = maps['maps']
        img = maps[results['img_info']['t_idx']].transpose(1, 2, 0)
        img = img.astype(np.float32) / 255.

        results['filename'] = filename
        results['ori_filename'] = results['img_info']['filename']
        results['img'] = img
        results['img_shape'] = img.shape
        results['ori_shape'] = img.shape

        # Set initial values for default meta_keys
        results['pad_shape'] = img.shape
        results['scale_factor'] = 1.0
        num_channels = img.shape[0]
        results['img_norm_cfg'] = dict(
            mean=np.zeros(num_channels, dtype=np.float32),
            std=np.ones(num_channels, dtype=np.float32),
            to_rgb=False)

        mask = (img[:, :, 1] > 0)
        goals = range(4, 4 + NUM_TARGET_CATEGORIES)  # channels of semantic map

        # Setting the "ground-truth" for prediction here
        results['gt_semantic_seg'] = (maps[-1, goals] * (1 - mask)).transpose(1, 2, 0)
        results['seg_fields'].append('gt_semantic_seg')
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += f'(to_float32={self.to_float32},'
        repr_str += f"imdecode_backend='{self.imdecode_backend}')"
        return repr_str

@DATASETS.register_module()
class SemMapDataset(CustomDataset):

    CLASSES = ['chair', 'couch', 'potted plant', 'bed', 'toilet', 'tv', 'dining-table', 'oven', 
              'sink', 'refrigerator', 'book', 'clock', 'vase', 'cup', 'bottle']
    PALETTE = np.array([
        1.0, 1.0, 1.0,
        0.6, 0.6, 0.6,
        0.95, 0.95, 0.95,
        0.96, 0.36, 0.26,
        0.12156862745098039, 0.47058823529411764, 0.7058823529411765,
        0.9400000000000001, 0.7818, 0.66,
        0.9400000000000001, 0.8868, 0.66,
        0.8882000000000001, 0.9400000000000001, 0.66,
        0.7832000000000001, 0.9400000000000001, 0.66,
        0.6782000000000001, 0.9400000000000001, 0.66,
        0.66, 0.9400000000000001, 0.7468000000000001,
        0.66, 0.9400000000000001, 0.8518000000000001,
        0.66, 0.9232, 0.9400000000000001,
        0.66, 0.8182, 0.9400000000000001,
        0.66, 0.7132, 0.9400000000000001,
        0.7117999999999999, 0.66, 0.9400000000000001,
        0.8168, 0.66, 0.9400000000000001,
        0.9218, 0.66, 0.9400000000000001,
        0.9400000000000001, 0.66, 0.8531999999999998,
        0.9400000000000001, 0.66, 0.748199999999999]).reshape((20, 3)) * 255

    def __init__(self, **kwargs):
        super().__init__(img_suffix='.npz', seg_map_suffix='.npz', 
                         split=None, **kwargs)
        assert osp.exists(self.img_dir) 

    def load_annotations(self, img_dir, img_suffix, ann_dir, seg_map_suffix,
                         split):
        """
        Load annotation from directory.
        Args:
            img_dir (str): Path to image directory
            img_suffix (str): Suffix of images.
            ann_dir (str|None): Path to annotation directory.
            seg_map_suffix (str|None): Suffix of segmentation maps.
            split (str|None): Split txt file. If split is specified, only file
                with suffix in the splits will be loaded. Otherwise, all images
                in img_dir/ann_dir will be loaded. Default: None
        Returns:
            list[dict]: All image info of dataset.
        """

        img_infos = []
        k=0

        for img in self.file_client.list_dir_or_file(
                dir_path=img_dir,
                list_dir=False,
                suffix=img_suffix,
                recursive=True):
            k+=1
            for t_idx in range(10):  # use first 10 timesteps as partial map inputs
                img_info = dict(filename=img)
                img_info['t_idx'] = t_idx
                img_infos.append(img_info)

        img_infos = sorted(img_infos, key=lambda x: x['filename'])

        print_log(f'Loaded {len(img_infos)} images from {k} .npz files', logger=get_root_logger())
        return img_infos

    def get_ann_info(self, idx):
        """
        Get annotation by index.
        """
        # We don't have separate annotation files, everything is in the map sequence
        return None

    def get_gt_seg_map_by_idx(self, index):
        """Get one ground truth segmentation map for evaluation."""
        img_info = self.img_infos[index]
        results = dict(img_info=img_info)
        self.pre_pipeline(results)
        results = self.pipeline(results)
        return results['gt_semantic_seg']

    def pre_eval(self, preds, indices):
        """Collect eval result from each iteration.

        Args:
            preds (list[torch.Tensor] | torch.Tensor): the segmentation logit
                after argmax, shape (N, H, W).
            indices (list[int] | int): the prediction related ground truth
                indices.

        Returns:
            list[torch.Tensor]: (area_intersect, area_union, area_prediction,
                area_ground_truth).
        """
        # In order to compat with batch inference
        if not isinstance(indices, list):
            indices = [indices]
        if not isinstance(preds, list):
            preds = [preds]

        pre_eval_results = []

        for pred, index in zip(preds, indices):
            # Original pred is logit, we need to convert it to probability
            pred = sigmoid(pred)
            seg_map = self.get_gt_seg_map_by_idx(index)
            seg_map = seg_map[0].transpose(2, 0, 1) / 255.
            pre_eval_results.append(
                intersect_and_union(
                    pred,
                    seg_map,
                    NUM_TARGET_CATEGORIES,
                    self.ignore_index,
                    # as the labels has been converted when dataset initialized
                    # in `get_palette_for_custom_classes ` this `label_map`
                    # should be `dict()`, see
                    # https://github.com/open-mmlab/mmsegmentation/issues/1415
                    # for more ditails
                    label_map=dict(),
                    reduce_zero_label=self.reduce_zero_label))

        return pre_eval_results

@weighted_loss
def my_loss(pred, target):
    target = torch.permute(target, (0, 3, 1, 2))
    assert pred.size() == target.size() and target.numel() > 0
    wts = [36.64341412, 30.19407855, 106.23704066, 25.58503269, 100.4556983, 167.64383946]  # inverse frequency
    pos_weight = torch.ones(pred[0].shape).to(pred.device) 
    for i, wt in enumerate(wts):
        pos_weight[i] = wts[i]

    loss = F.binary_cross_entropy_with_logits(pred, target / 255., reduction='none')  # no weighting
    # loss = F.binary_cross_entropy_with_logits(pred, target / 255., reduction='none', pos_weight=pos_weight)
    return loss

@LOSSES.register_module
class MyLoss(nn.Module):

    def __init__(self, reduction='mean', loss_weight=1.0):
        super(MyLoss, self).__init__()
        self.reduction = reduction
        self.loss_weight = loss_weight

    def forward(self,
                pred,
                target,
                weight=None,
                avg_factor=None,
                reduction_override=None,
                ignore_index=None):
        assert reduction_override in (None, 'none', 'mean', 'sum')
        reduction = (
            reduction_override if reduction_override else self.reduction)
        loss = self.loss_weight * my_loss(
            pred, target, weight, reduction=reduction, avg_factor=avg_factor)
        return loss

    @property
    def loss_name(self):
        return 'loss_bce'

if __name__ == '__main__':

    cfg = Config.fromfile('prediction/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py')

    # Since we use only one GPU, BN is used instead of SyncBN
    cfg.norm_cfg = dict(type='BN', requires_grad=True)
    cfg.model.backbone.norm_cfg = cfg.norm_cfg
    cfg.model.decode_head.norm_cfg = cfg.norm_cfg
    cfg.model.backbone.in_channels = 4 + NUM_TARGET_CATEGORIES + NUM_EXTRA_CATEGORIES + 1
    cfg.model.decode_head.num_classes = NUM_TARGET_CATEGORIES
    cfg.model.decode_head.loss_decode = dict(type='MyLoss', loss_weight=1.0)

    cfg.model.auxiliary_head.num_classes = NUM_TARGET_CATEGORIES
    cfg.model.auxiliary_head.loss_decode = dict(type='MyLoss', loss_weight=0.4)
    cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg

    # Modify dataset type and path
    cfg.dataset_type = 'SemMapDataset'
    cfg.data_root = 'data/saved_maps'

    cfg.data.samples_per_gpu = 8
    cfg.data.workers_per_gpu = 2

    cfg.img_norm_cfg = dict(
        mean=[0, 0, 0], std=[1 ,1, 1], to_rgb=False)

    orig_in_size = 960   # the map size
    in_size = orig_in_size
    cfg.crop_size = (in_size, in_size)
    cfg.train_pipeline = [
        dict(type='LoadMapFromFile'),
        dict(type='Resize', img_scale=None, ratio_range=(in_size / orig_in_size, in_size / orig_in_size)),
        dict(type='Pad', size=(int(in_size * 1.25), int(in_size * 1.25)), pad_val=0, seg_pad_val=0),
        dict(type='RandomCrop', crop_size=cfg.crop_size, cat_max_ratio=1.),
        dict(type='RandomFlip', flip_ratio=0.5),
        dict(type='RandomRotate', prob=1., degree=180, pad_val=0, seg_pad_val=0),
        dict(type='DefaultFormatBundle'),
        dict(type='Collect', keys=['img', 'gt_semantic_seg']),
    ]

    cfg.test_pipeline = [
        dict(type='LoadMapFromFile'),
        dict(
            type='MultiScaleFlipAug',
            img_scale=None,
            img_ratios=[in_size / orig_in_size],
            flip=False,
            transforms=[
                dict(type='Resize', keep_ratio=True),
                dict(type='ImageToTensor', keys=['img']),
                dict(type='Collect', keys=['img', 'gt_semantic_seg']),
            ])
    ]

    cfg.data.train.type = cfg.dataset_type
    cfg.data.train.data_root = cfg.data_root
    cfg.data.train.img_dir = 'train' 
    cfg.data.train.ann_dir = None
    cfg.data.train.pipeline = cfg.train_pipeline

    cfg.data.val.type = cfg.dataset_type
    cfg.data.val.data_root = cfg.data_root
    cfg.data.val.img_dir = 'val'
    cfg.data.train.ann_dir = None
    cfg.data.val.pipeline = cfg.test_pipeline

    cfg.data.test.type = cfg.dataset_type
    cfg.data.test.data_root = cfg.data_root
    cfg.data.test.img_dir = 'val'  
    cfg.data.train.ann_dir = None
    cfg.data.test.pipeline = cfg.test_pipeline

    # Set up working dir to save files and logs.  
    cfg.work_dir =  'data/work_dirs/final_model' 

    cfg.runner.max_iters = 60000
    cfg.log_config.interval = 100
    cfg.evaluation.interval = 200
    cfg.checkpoint_config.interval = 2000
    cfg.optimizer = optimizer = dict(type='Adam', lr=0.0005)
    cfg.lr_config.min_lr = 1e-5

    # Set seed to facilitate reproducing the result
    cfg.seed = 0
    set_random_seed(0, deterministic=False)
    cfg.gpu_ids = range(1)
    cfg.device = get_device()

    # Let's have a look at the final config used for training
    print(f'Config:\n{cfg}')

    # Build the dataset
    datasets = [build_dataset(cfg.data.train)]

    # Build the detector
    model = build_segmentor(cfg.model)

    # Add an attribute for visualization convenience
    model.CLASSES = datasets[0].CLASSES

    # Create work_dir
    mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
    cfg.dump(osp.join(cfg.work_dir, 'cfg.py'))
    model.train()
    train_segmentor(model, datasets, cfg, distributed=False, validate=False, 
                    meta=dict())

Basically, I added get_gt_seg_map_by_idx and pre_eval functions in the SemMapDataset class, they will work with the single_gpu_test function in prediction/mmseg/apis/test.py here to enable evaluation:

def single_gpu_test(model,
                    data_loader,
                    show=False,
                    out_dir=None,
                    efficient_test=False,
                    opacity=0.5,
                    pre_eval=False,
                    format_only=False,
                    format_args={}):
    """Test with single GPU by progressive mode.

    Args:
        model (nn.Module): Model to be tested.
        data_loader (utils.data.Dataloader): Pytorch data loader.
        show (bool): Whether show results during inference. Default: False.
        out_dir (str, optional): If specified, the results will be dumped into
            the directory to save output results.
        efficient_test (bool): Whether save the results as local numpy files to
            save CPU memory during evaluation. Mutually exclusive with
            pre_eval and format_results. Default: False.
        opacity(float): Opacity of painted segmentation map.
            Default 0.5.
            Must be in (0, 1] range.
        pre_eval (bool): Use dataset.pre_eval() function to generate
            pre_results for metric evaluation. Mutually exclusive with
            efficient_test and format_results. Default: False.
        format_only (bool): Only format result for results commit.
            Mutually exclusive with pre_eval and efficient_test.
            Default: False.
        format_args (dict): The args for format_results. Default: {}.
    Returns:
        list: list of evaluation pre-results or list of save file names.
    """
    if efficient_test:
        warnings.warn(
            'DeprecationWarning: ``efficient_test`` will be deprecated, the '
            'evaluation is CPU memory friendly with pre_eval=True')
        mmcv.mkdir_or_exist('.efficient_test')
    # when none of them is set true, return segmentation results as
    # a list of np.array.
    assert [efficient_test, pre_eval, format_only].count(True) <= 1, \
        '``efficient_test``, ``pre_eval`` and ``format_only`` are mutually ' \
        'exclusive, only one of them could be true .'

    model.eval()
    results = []
    dataset = data_loader.dataset
    prog_bar = mmcv.ProgressBar(len(dataset))
    # The pipeline about how the data_loader retrieval samples from dataset:
    # sampler -> batch_sampler -> indices
    # The indices are passed to dataset_fetcher to get data from dataset.
    # data_fetcher -> collate_fn(dataset[index]) -> data_sample
    # we use batch_sampler to get correct data idx
    loader_indices = data_loader.batch_sampler

    for batch_indices, data in zip(loader_indices, data_loader):
        with torch.no_grad():
            result = model(return_loss=False, **data)

        if show or out_dir:
            img_tensor = data['img'][0]
            img_metas = data['img_metas'][0].data[0]
            imgs = tensor2imgs(img_tensor, **img_metas[0]['img_norm_cfg'])
            assert len(imgs) == len(img_metas)

            for img, img_meta in zip(imgs, img_metas):
                h, w, _ = img_meta['img_shape']
                img_show = img[:h, :w, :]

                ori_h, ori_w = img_meta['ori_shape'][:-1]
                img_show = mmcv.imresize(img_show, (ori_w, ori_h))

                if out_dir:
                    out_file = osp.join(out_dir, img_meta['ori_filename'])
                else:
                    out_file = None

                model.module.show_result(
                    img_show,
                    result,
                    palette=dataset.PALETTE,
                    show=show,
                    out_file=out_file,
                    opacity=opacity)

        if efficient_test:
            result = [np2tmp(_, tmpdir='.efficient_test') for _ in result]

        if format_only:
            result = dataset.format_results(
                result, indices=batch_indices, **format_args)
        if pre_eval:
            # TODO: adapt samples_per_gpu > 1.
            # only samples_per_gpu=1 valid now
            result = dataset.pre_eval(result, indices=batch_indices)
            results.extend(result)
        else:
            results.extend(result)

        batch_size = len(result)
        for _ in range(batch_size):
            prog_bar.update()

    return results

I got zero IoU during evaluation on the val set, not sure where I was wrong.

ajzhai commented 9 months ago

If the IoU is taking the argmax over the classes, it may indeed be zero, because the model is trained with binary cross entropy and there is no "empty" class. However, something is definitely wrong with your prediction outputs, because the visualizer should produce stuff like the videos on our project page. For example: 0-0-Vis-21

Are you sure that the weights are being loaded successfully? And is the navigation success rate OK?

Jaraxxus-Me commented 9 months ago

The weights should be loaded successfully, I am a bit confused on how to reproduce your success rate. Seems that I can only do this for HM3d val set (Table 4), but I'm not sure which 500 episodes did you use, there are 2000 episodes in total. The figure I attached here is from the first episode in HM3d val set. Could you tell me which episode are you using for the figure you attached? I can try that.

ajzhai commented 9 months ago

It was an old image but I believe it was also the first episode. Table 4 does use the first 500 episodes, but I am not asking for an exact number; I am just wondering if you see that the agent's behavior is normal. For example, you can just check the first 10 episodes, I think there should be 6 or 7 successes

Jaraxxus-Me commented 9 months ago

Yes I'm able to get non-zero output now and I'm testing the first 500 episodes. It turns out that I was using my locally re-trained model (iter_60000.pth), maybe it is the re-training failure. I'm also testing the prediction accuracy. Thanks for the help, I'll reach out if I have further questions.

Jaraxxus-Me commented 9 months ago

Hi, I've finished the evaluation on the first 500 episodes, but I only get 0.311 SPL and 0.608 Success. I wonder which version of hm3d scene datasets did you use in Table 4? Is it v0.1? As I'm using v0.2, so I assume the score difference comes from that

ajzhai commented 9 months ago

Yes, it is v0.1 (see README)

Jaraxxus-Me commented 9 months ago

Thanks, v0.1 got close results.