How to compute the params and runtime(inference time?)

Jane-QinJ commented 2 years ago

Dear author, first of all, thanks for your great work. After reading your paper, I really want to know how to calculate the params and the runtime of adding Focals Conv to VoxelRCNN as u mentioned in your Experiments, and I want to try it, but I don't know how to do it, there's little information on the Internet, and after searching it in Google, I got confused. So I ask for your help if you have the code to accomplish it. I would very appreciate it if you could help me! Thank you in advance.

yukang2017 commented 2 years ago

Hi,

For params, we can count the amount layer by layer. We can use (Cin, Cout, K, K, K) for 3D convolution, and (Cin, Cout, K, K) for 2D convolution. It is not a hard job.

Here, a more simple method to count params is to directly check the file size of model dict (remember to remove unused keys in the checkpoint and only keep 'model_state'). The params is the file size divided by 4. This is because the model is usually stored by float32. 1M * 32 bit = 32Mb = 4MB and 1 Byte = 8 bit.

For runtime, you can use a timer to count the forward time in the network. For example, you can place a time.time() at the begin of forward function and the other time.time() at the end of it (before return). And count the average of all validation set.

yukang2017 commented 2 years ago

I close this issue for now. Feel free to reopen if there are any other issues.

konyul commented 2 years ago

How can I see the remaining time for training?

ThomasPDM commented 1 year ago

I reproduced the results of the paper, by training PV-RCNN and Voxel R-CNN based multimodal models on the KITTI dataset. Meanwhile, I checked the number of parameters and found that it was much larger than that reported in the paper. After loading the provided checkpoints, I found that they do not include the layers from "backbone_3d.semseg.ifn.model". I do not understand why my models have these layers while the provided checkpoints do not. See how this affects the sizes:

MODEL	Whole	backbone_3d.semseg.ifn.model	Focals Conv
Voxel R-CNN multimodal multiclasses	47.57M	39.7M	7.87M
PV-RCNN multimodal multiclasses	53.09M	39.7M	13.39M

What are these layers and why are they not included in the provided checkpoints ?

import torch
from pcdet.config import cfg, cfg_from_yaml_file
from pcdet.datasets import build_dataloader
from pcdet.models import build_network
from pcdet.utils import common_utils

if __name__ == '__main__':
    # Load model config (voxel_rcnn_multiclasses_focalsconv.yaml for Voxel R-CNN)
    cfg_from_yaml_file('cfgs/kitti_models/pv_rcnn_focal_multimodal.yaml', cfg)

    # Init logger
    log_file = cfg.ROOT_DIR / 'output' / ('log_modelsize.txt') 
    logger = common_utils.create_logger(log_file, rank=cfg.LOCAL_RANK)

    # Init dataset
    test_set, test_loader, test_sampler = build_dataloader(
        dataset_cfg=cfg.DATA_CONFIG,
        class_names=cfg.CLASS_NAMES,
        batch_size=1,
        dist=False, workers=4,
        logger=logger,
        training=False,
        merge_all_iters_to_one_epoch=False,
        total_epochs=1
    )

    # Build model
    model = build_network(model_cfg=cfg.MODEL, 
                          num_class=len(cfg.CLASS_NAMES), 
                          dataset=test_set)

    # Init variables to store sizes
    focalsconv_size = 0
    ifn_size = 0

    # Browse model layers
    for name, param in model.state_dict().items():
        num_params = torch.prod(torch.tensor(param.shape)).item()
        if name.startswith("backbone_3d.semseg.ifn.model"):
            ifn_size += num_params
        else:
            focalsconv_size += num_params
    model_size = focalsconv_size + ifn_size

    # Show sizes
    print(f"Whole Size: {round((model_size)/10e5, 2)}M parameters")
    print(f"...ifn.model Size: {round(ifn_size/10e5, 2)}M parameters")
    print(f"Focals Conv Size: {round(focalsconv_size/10e5, 2)}M parameters")

I ran the script from the "tools" directory, you can run it elsewhere but you will need to update the path to the configuration file (.yaml).

yukang2017 commented 1 year ago

Hi @ThomasPDM ,

After double checking, I realized what the problem is.

backbone_3d.semseg is actually initialized from a 2D image segmentation model, DeepLabv3 with ResNet50 backbone.

You can see it from here. https://github.com/dvlab-research/FocalsConv/blob/2202be495b8a00af4a9a46e087e415f3f9533823/OpenPCDet/pcdet/models/backbones_3d/SemanticSeg/sem_deeplabv3.py#L158

"https://pytorch.org/vision/stable/models/generated/torchvision.models.segmentation.deeplabv3_resnet50.html#torchvision.models.segmentation.deeplabv3_resnet50".

However, what we actually used is only the first several layers in this model, "layer1" of this model. Other layers in the deeplabv3_resnet50 are not used in our model.

You can see this argument here. https://github.com/dvlab-research/FocalsConv/blob/2202be495b8a00af4a9a46e087e415f3f9533823/OpenPCDet/pcdet/models/backbones_3d/spconv_backbone_focal.py#L132

Thus, I deleted the other layers in the deeplabv3_resnet50. That is why my pre-trained weight is smaller than yours.

Regards, Yukang Chen

ThomasPDM commented 1 year ago

Thanks for the quick answer. I do not understand how to remove these parameters from the whole model. I did not change anything except from the script that I added to build the model and display the number of parameters. To be sure, I just retried from the beginning using an Ubuntu 20.04 with Conda:

# Get the repository
git clone https://github.com/dvlab-research/FocalsConv.git
cd FocalsConv/OpenPCDet

# Setup conda environment
conda create -n focalsconv python==3.8
conda activate focalsconv
conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
cd OpenPCDet
pip install -r requirements.txt

# Set env path into LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/path/to/conda/envs/focalsconv/lib:$LD_LIBRARY_PATH

# Install OpenPCDet
python setup.py develop --user

# Set access to the KITTI dataset from FocalsConv/OpenPCDet/data
ln -s /path/to/KITTI/data/training data/kitti/training
ln -s /path/to/KITTI/data/testing data/kitti/testing
ln -s /path/to/KITTI/data/ImageSets data/kitti/ImageSets

# Generate preprocessed data for OpenPCDet
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

# Create the output directory
mkdir output

# Copy and run the script (provided above)
cd tools
cp /path/to/the/size_script.py size_script.py
python size_script.py

The results are the same as above and when I load the checkpoints by adding this after building the model:

# Load checkpoints (for PV-RCNN here, but same kind of results with Voxel R-CNN)
ckpt = "/path/to/provided/checkpoints/pvrcnn_focal_multimodal.pth"
model.load_params_from_file(filename=ckpt, logger=logger, to_cpu=False)
model.cuda()

Then the weights for "backbone_3d.semseg.ifn.model" layers are not updated:

INFO  ==> Checkpoint trained from version: pcdet+0.3.0+0000000
INFO  Not updated weight backbone_3d.semseg.ifn.model.backbone.layer2.0.conv1.weight: torch.Size([128, 256, 1, 1])
...
INFO  Not updated weight backbone_3d.semseg.ifn.model.classifier.4.bias: torch.Size([21])
INFO  ==> Done (loaded 467/763)

How to build the model without these layers ? How did you manage to delete the other layers in the deeplabv3_resnet50 ? Do you iterate over the children of the model to create a copy without these layers ?

Regards

yukang2017 commented 1 year ago

Hi,

For me, I change the def _segm_resnet function in path_to/site-packages/torchvision/models/segmentation/segmentation.py as below.

There are two modifications: (1) return_layers = {'layer1': 'out'} #{'layer4': 'out'} (2) classifier = None #model_map[name][0](inplanes, num_classes)

I know that this might not be the best way. You can also rewrite it in other ways.

Regards, Yukang Chen

ThomasPDM commented 1 year ago

Hi,

Instead of updating the library locally, I suggest to add a few lines of code in OpenPCDet/pcdet/models/backbones_3D/SemanticSeg/sem_deeplabv3.py:

class SemDeepLabV3(SegTemplate):
    def __init__(self, backbone_name, **kwargs):
        ...
        super().__init__(constructor=constructor, **kwargs)

        # We actually use only the first several layers in this model, 
        if backbone_name == "ResNet50":
            backbone_blocks = ["conv1", "bn1", "relu", "maxpool", "layer1"]
            new_model = next(self.model.children()) # keep the backbone
            self.model = nn.Sequential() # reset the model
            for name, module in new_model.named_children():
                if name in backbone_blocks: # keep useful blocks
                    self.model.add_module(name=name, module=module)

Be careful, I think I either added or forgot a relevant block. In fact, this does not give exactly the same parameters as your suggested modification. Anyway, you may be inspired by this attempt. Also, in my opinion, you should update your code so that everyone can make the well-sized model (or at least, explain what you did in README.md).

Here is an update to the table that I provided above:

MODEL	Whole	backbone_3d.semseg.ifn.model	Focals Conv
Voxel R-CNN multimodal multiclasses	8.10M	0.23M	7.87M
PV-RCNN multimodal multiclasses	13.62M	0.23M	13.39M

Thanks for the answers,

Regards

yukang2017 commented 1 year ago

Thanks for your remind. I will update this information in the README.md

dvlab-research / FocalsConv

How to compute the params and runtime(inference time?) #9