DerrickXuNu / OpenCOOD

[ICRA 2022] An opensource framework for cooperative detection. Official implementation for OPV2V.
https://mobility-lab.seas.ucla.edu/opv2v/
Other
673 stars 100 forks source link

multi-gpus on single machine #27

Closed linchunmian closed 2 years ago

linchunmian commented 2 years ago

Hi, how should I train the model with multi-gpus on single machine? Following the nn.dataparallel function, the error about tensor dimension mismatch (xx.view) regarding AttFusion class in self.attn.py file occurs. Please help me

DerrickXuNu commented 2 years ago

Hi, training on multi-gpu needs to use torch.distributed function. I have made it work in my local machine, and I will release the version in the near future

linchunmian commented 2 years ago

Thanks. How should I modify the train script if I want to implement the function right now? Or some modifications in the train script?---- Replied Message ----FromRunsheng @.>Date07/07/2022 23:43 @.> @.**@.>SubjectRe: [DerrickXuNu/OpenCOOD] multi-gpus on single machine (Issue #27) Hi, training on multi-gpu needs to use torch.distributed function. I have made it work in my local machine, and I will release the version in the near future

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/DerrickXuNu/OpenCOOD/issues/27#issuecomment-1177819302", "url": "https://github.com/DerrickXuNu/OpenCOOD/issues/27#issuecomment-1177819302", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

DerrickXuNu commented 2 years ago

Yes, you only need to do modifications on the train.py. There is nothing else you need to modify.

linchunmian commented 2 years ago

Also, should I make efforts on model saving if adopting distributed function? Another concern is how to customize my own V2V data based on CARLA+SUMO simulation? Could you please share the general pipeline of data generation? Many thanks!

DerrickXuNu commented 2 years ago

No, you don't need to. For the question related to creating your own dataset, it is quite beyond the scope of this repo.

linchunmian commented 2 years ago

Thanks, you mean data generation process refer to the link to 'https://github.com/ucla-mobility/OpenCDA'?

DerrickXuNu commented 2 years ago

Yes

linchunmian commented 2 years ago

No, you don't need to. For the question related to creating your own dataset, it is quite beyond the scope of this repo.

Thanks. I append distributed training into the train.py following PyTorch DistributedDataParallel function. However, the model fail to obtain the valid ap result (ap@50=0.0&ap@70=0.0) on the test data. What probably cause this problem in your experience?

DerrickXuNu commented 2 years ago

Try one thing, when you load model from checkpoint, map it to cpu first.

linchunmian commented 2 years ago

Thanks. But it seems nothing happens. I just add distributed function without any other changes. should the BN be replace with SynBN?

DerrickXuNu commented 2 years ago

If you train from continued checkpoint, is the loss normal?

linchunmian commented 2 years ago

When I train from the checkpoint you provided, the command and error information occurs as follow: ''' CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 opencood/tools/train.py --hypes_yaml opencood/hypes_yaml/point_pillar_intermediate_fusion.yaml --model_dir models/pointpillar_attentive_fusion/pointpillar_attentive_fusion


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


INFO - 2022-07-09 09:32:40,141 - distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 1 INFO - 2022-07-09 09:32:40,149 - distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0 Dataset Building Dataset Building Creating Model Creating Model Traceback (most recent call last): File "opencood/tools/train.py", line 207, in main() File "opencood/tools/train.py", line 95, in main init_epoch, model = train_utils.load_saved_model(saved_path, model) File "/mnt/1c19fbb2-4609-4579-9be5-7e8b872cfcd7/projects/cooper/OpenCOOD/opencood/tools/train_utils.py", line 50, in load_saved_model 'latest.pth'))) File "/home/admin1/anaconda3/envs/opencood/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1224, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: Missing key(s) in state_dict: "module.pillar_vfe.pfn_layers.0.linear.weight", "module.pillar_vfe.pfn_layers.0.norm.weight", "module.pillar_vfe.pfn_layers.0.norm.bias", "module.pillar_vfe.pfn_layers.0.norm.running_mean", "module.pillar_vfe.pfn_layers.0.norm.running_var", "module.backbone.blocks.0.1.weight", "module.backbone.blocks.0.2.weight", "module.backbone.blocks.0.2.bias", "module.backbone.blocks.0.2.running_mean", "module.backbone.blocks.0.2.running_var", "module.backbone.blocks.0.4.weight", "module.backbone.blocks.0.5.weight", "module.backbone.blocks.0.5.bias", "module.backbone.blocks.0.5.running_mean", "module.backbone.blocks.0.5.running_var", "module.backbone.blocks.0.7.weight", "module.backbone.blocks.0.8.weight", "module.backbone.blocks.0.8.bias", "module.backbone.blocks.0.8.running_mean", "module.backbone.blocks.0.8.running_var", "module.backbone.blocks.0.10.weight", "module.backbone.blocks.0.11.weight", "module.backbone.blocks.0.11.bias", "module.backbone.blocks.0.11.running_mean", "module.backbone.blocks.0.11.running_var", "module.backbone.blocks.1.1.weight", "module.backbone.blocks.1.2.weight", "module.backbone.blocks.1.2.bias", "module.backbone.blocks.1.2.running_mean", "module.backbone.blocks.1.2.running_var", "module.backbone.blocks.1.4.weight", "module.backbone.blocks.1.5.weight", "module.backbone.blocks.1.5.bias", "module.backbone.blocks.1.5.running_mean", "module.backbone.blocks.1.5.running_var", "module.backbone.blocks.1.7.weight", "module.backbone.blocks.1.8.weight", "module.backbone.blocks.1.8.bias", "module.backbone.blocks.1.8.running_mean", "module.backbone.blocks.1.8.running_var", "module.backbone.blocks.1.10.weight", "module.backbone.blocks.1.11.weight", "module.backbone.blocks.1.11.bias", "module.backbone.blocks.1.11.running_mean", "module.backbone.blocks.1.11.running_var", "module.backbone.blocks.1.13.weight", "module.backbone.blocks.1.14.weight", "module.backbone.blocks.1.14.bias", "module.backbone.blocks.1.14.running_mean", "module.backbone.blocks.1.14.running_var", "module.backbone.blocks.1.16.weight", "module.backbone.blocks.1.17.weight", "module.backbone.blocks.1.17.bias", "module.backbone.blocks.1.17.running_mean", "module.backbone.blocks.1.17.running_var", "module.backbone.blocks.2.1.weight", "module.backbone.blocks.2.2.weight", "module.backbone.blocks.2.2.bias", "module.backbone.blocks.2.2.running_mean", "module.backbone.blocks.2.2.running_var", "module.backbone.blocks.2.4.weight", "module.backbone.blocks.2.5.weight", "module.backbone.blocks.2.5.bias", "module.backbone.blocks.2.5.running_mean", "module.backbone.blocks.2.5.running_var", "module.backbone.blocks.2.7.weight", "module.backbone.blocks.2.8.weight", "module.backbone.blocks.2.8.bias", "module.backbone.blocks.2.8.running_mean", "module.backbone.blocks.2.8.running_var", "module.backbone.blocks.2.10.weight", "module.backbone.blocks.2.11.weight", "module.backbone.blocks.2.11.bias", "module.backbone.blocks.2.11.running_mean", "module.backbone.blocks.2.11.running_var", "module.backbone.blocks.2.13.weight", "module.backbone.blocks.2.14.weight", "module.backbone.blocks.2.14.bias", "module.backbone.blocks.2.14.running_mean", "module.backbone.blocks.2.14.running_var", "module.backbone.blocks.2.16.weight", "module.backbone.blocks.2.17.weight", "module.backbone.blocks.2.17.bias", "module.backbone.blocks.2.17.running_mean", "module.backbone.blocks.2.17.running_var", "module.backbone.blocks.2.19.weight", "module.backbone.blocks.2.20.weight", "module.backbone.blocks.2.20.bias", "module.backbone.blocks.2.20.running_mean", "module.backbone.blocks.2.20.running_var", "module.backbone.blocks.2.22.weight", "module.backbone.blocks.2.23.weight", "module.backbone.blocks.2.23.bias", "module.backbone.blocks.2.23.running_mean", "module.backbone.blocks.2.23.running_var", "module.backbone.blocks.2.25.weight", "module.backbone.blocks.2.26.weight", "module.backbone.blocks.2.26.bias", "module.backbone.blocks.2.26.running_mean", "module.backbone.blocks.2.26.running_var", "module.backbone.deblocks.0.0.weight", "module.backbone.deblocks.0.1.weight", "module.backbone.deblocks.0.1.bias", "module.backbone.deblocks.0.1.running_mean", "module.backbone.deblocks.0.1.running_var", "module.backbone.deblocks.1.0.weight", "module.backbone.deblocks.1.1.weight", "module.backbone.deblocks.1.1.bias", "module.backbone.deblocks.1.1.running_mean", "module.backbone.deblocks.1.1.running_var", "module.backbone.deblocks.2.0.weight", "module.backbone.deblocks.2.1.weight", "module.backbone.deblocks.2.1.bias", "module.backbone.deblocks.2.1.running_mean", "module.backbone.deblocks.2.1.running_var", "module.cls_head.weight", "module.cls_head.bias", "module.reg_head.weight", "module.reg_head.bias". Unexpected key(s) in state_dict: "pillar_vfe.pfn_layers.0.linear.weight", "pillar_vfe.pfn_layers.0.norm.weight", "pillar_vfe.pfn_layers.0.norm.bias", "pillar_vfe.pfn_layers.0.norm.running_mean", "pillar_vfe.pfn_layers.0.norm.running_var", "pillar_vfe.pfn_layers.0.norm.num_batches_tracked", "backbone.blocks.0.1.weight", "backbone.blocks.0.2.weight", "backbone.blocks.0.2.bias", "backbone.blocks.0.2.running_mean", "backbone.blocks.0.2.running_var", "backbone.blocks.0.2.num_batches_tracked", "backbone.blocks.0.4.weight", "backbone.blocks.0.5.weight", "backbone.blocks.0.5.bias", "backbone.blocks.0.5.running_mean", "backbone.blocks.0.5.running_var", "backbone.blocks.0.5.num_batches_tracked", "backbone.blocks.0.7.weight", "backbone.blocks.0.8.weight", "backbone.blocks.0.8.bias", "backbone.blocks.0.8.running_mean", "backbone.blocks.0.8.running_var", "backbone.blocks.0.8.num_batches_tracked", "backbone.blocks.0.10.weight", "backbone.blocks.0.11.weight", "backbone.blocks.0.11.bias", "backbone.blocks.0.11.running_mean", "backbone.blocks.0.11.running_var", "backbone.blocks.0.11.num_batches_tracked", "backbone.blocks.1.1.weight", "backbone.blocks.1.2.weight", "backbone.blocks.1.2.bias", "backbone.blocks.1.2.running_mean", "backbone.blocks.1.2.running_var", "backbone.blocks.1.2.num_batches_tracked", "backbone.blocks.1.4.weight", "backbone.blocks.1.5.weight", "backbone.blocks.1.5.bias", "backbone.blocks.1.5.running_mean", "backbone.blocks.1.5.running_var", "backbone.blocks.1.5.num_batches_tracked", "backbone.blocks.1.7.weight", "backbone.blocks.1.8.weight", "backbone.blocks.1.8.bias", "backbone.blocks.1.8.running_mean", "backbone.blocks.1.8.running_var", "backbone.blocks.1.8.num_batches_tracked", "backbone.blocks.1.10.weight", "backbone.blocks.1.11.weight", "backbone.blocks.1.11.bias", "backbone.blocks.1.11.running_mean", "backbone.blocks.1.11.running_var", "backbone.blocks.1.11.num_batches_tracked", "backbone.blocks.1.13.weight", "backbone.blocks.1.14.weight", "backbone.blocks.1.14.bias", "backbone.blocks.1.14.running_mean", "backbone.blocks.1.14.running_var", "backbone.blocks.1.14.num_batches_tracked", "backbone.blocks.1.16.weight", "backbone.blocks.1.17.weight", "backbone.blocks.1.17.bias", "backbone.blocks.1.17.running_mean", "backbone.blocks.1.17.running_var", "backbone.blocks.1.17.num_batches_tracked", "backbone.blocks.2.1.weight", "backbone.blocks.2.2.weight", "backbone.blocks.2.2.bias", "backbone.blocks.2.2.running_mean", "backbone.blocks.2.2.running_var", "backbone.blocks.2.2.num_batches_tracked", "backbone.blocks.2.4.weight", "backbone.blocks.2.5.weight", "backbone.blocks.2.5.bias", "backbone.blocks.2.5.running_mean", "backbone.blocks.2.5.running_var", "backbone.blocks.2.5.num_batches_tracked", "backbone.blocks.2.7.weight", "backbone.blocks.2.8.weight", "backbone.blocks.2.8.bias", "backbone.blocks.2.8.running_mean", "backbone.blocks.2.8.running_var", "backbone.blocks.2.8.num_batches_tracked", "backbone.blocks.2.10.weight", "backbone.blocks.2.11.weight", "backbone.blocks.2.11.bias", "backbone.blocks.2.11.running_mean", "backbone.blocks.2.11.running_var", "backbone.blocks.2.11.num_batches_tracked", "backbone.blocks.2.13.weight", "backbone.blocks.2.14.weight", "backbone.blocks.2.14.bias", "backbone.blocks.2.14.running_mean", "backbone.blocks.2.14.running_var", "backbone.blocks.2.14.num_batches_tracked", "backbone.blocks.2.16.weight", "backbone.blocks.2.17.weight", "backbone.blocks.2.17.bias", "backbone.blocks.2.17.running_mean", "backbone.blocks.2.17.running_var", "backbone.blocks.2.17.num_batches_tracked", "backbone.blocks.2.19.weight", "backbone.blocks.2.20.weight", "backbone.blocks.2.20.bias", "backbone.blocks.2.20.running_mean", "backbone.blocks.2.20.running_var", "backbone.blocks.2.20.num_batches_tracked", "backbone.blocks.2.22.weight", "backbone.blocks.2.23.weight", "backbone.blocks.2.23.bias", "backbone.blocks.2.23.running_mean", "backbone.blocks.2.23.running_var", "backbone.blocks.2.23.num_batches_tracked", "backbone.blocks.2.25.weight", "backbone.blocks.2.26.weight", "backbone.blocks.2.26.bias", "backbone.blocks.2.26.running_mean", "backbone.blocks.2.26.running_var", "backbone.blocks.2.26.num_batches_tracked", "backbone.deblocks.0.0.weight", "backbone.deblocks.0.1.weight", "backbone.deblocks.0.1.bias", "backbone.deblocks.0.1.running_mean", "backbone.deblocks.0.1.running_var", "backbone.deblocks.0.1.num_batches_tracked", "backbone.deblocks.1.0.weight", "backbone.deblocks.1.1.weight", "backbone.deblocks.1.1.bias", "backbone.deblocks.1.1.running_mean", "backbone.deblocks.1.1.running_var", "backbone.deblocks.1.1.num_batches_tracked", "backbone.deblocks.2.0.weight", "backbone.deblocks.2.1.weight", "backbone.deblocks.2.1.bias", "backbone.deblocks.2.1.running_mean", "backbone.deblocks.2.1.running_var", "backbone.deblocks.2.1.num_batches_tracked", "cls_head.weight", "cls_head.bias", "reg_head.weight", "reg_head.bias". Killing subprocess 488807 Killing subprocess 488808 Traceback (most recent call last): File "/home/admin1/anaconda3/envs/opencood/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/admin1/anaconda3/envs/opencood/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/admin1/anaconda3/envs/opencood/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in main() File "/home/admin1/anaconda3/envs/opencood/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/admin1/anaconda3/envs/opencood/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/admin1/anaconda3/envs/opencood/bin/python', '-u', 'opencood/tools/train.py', '--local_rank=1', '--hypes_yaml', 'opencood/hypes_yaml/point_pillar_intermediate_fusion.yaml', '--model_dir', 'models/pointpillar_attentive_fusion/pointpillar_attentive_fusion']' returned non-zero exit status 1. ''' I do not modify the model configuration, and it is normal to fine-tune the checkpoint if I adopt single GPU. Maybe the usage of distributed function is inappropriate, and in this case, how should I do next? In my train.py, I only add the initialization status, data sampler and distributeddataparallel functions: ''' torch.distributed.init_process_group(backend='nccl') train_sampler = torch.utils.data.distributed.DistributedSampler(opencood_train_dataset) if torch.cuda.device_count() > 1: model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[opt.local_rank], ) ''' here 'opt.local_rank' utilizes the default value 0. And my hardware platform is two-GPUs on a single machine. By the way, the file 'config.yaml' from the checkpoint you provided misses the keyword 'train_params', which is added in the config by myself when I continue to fine-tune a model from the checkpoint. Is it OK? Thanks in advance, and any helps would be appreciated a lot.

DerrickXuNu commented 2 years ago

There are two things you can try. First, change "latest.pth" to net_epoch30.pth (any number should work). Second, when you load the checkpoint, load it to CPU first, and remember set the strict flag to False

linchunmian commented 2 years ago

Thanks, I have solved this problem by replacing 'torch.save(model.state_dict(), ...)' with 'torch.save(model.module.state_dict(),...). Another concern is how to set correct batch size and learning rate. I see the default value in the config file is bs=2 and lr=0.002.

  1. for single-gpu case, should learning rate be linearly increased with the batch size, i.e., bs=8 -> lr=0.008?
  2. for multi-gpu case (e.g., 2 gpus), when adopting distributed training, the calculation of batch size should be the number of batch in all gpus, i.e., bs=8×2=16 -> lr=0.008×2=0.16, is it? By the way, how to determine the optimal training epoch? I do not get related training parameter in the config file from the checkpoint dir. Thanks again!
linchunmian commented 2 years ago

Thanks, I have solved this problem by replacing 'torch.save(model.state_dict(), ...)' with 'torch.save(model.module.state_dict(),...). Another concern is how to set correct batch size and learning rate. I see the default value in the config file is bs=2 and lr=0.002.

  1. for single-gpu case, should learning rate be linearly increased with the batch size, i.e., bs=8 -> lr=0.008?
  2. for multi-gpu case (e.g., 2 gpus), when adopting distributed training, the calculation of batch size should be the number of batch in all gpus, i.e., bs=8×2=16 -> lr=0.008×2=0.16, is it? By the way, how to determine the optimal training epoch? I do not get related training parameter in the config file from the checkpoint dir. Thanks again!

And I find at the same epoch, test ap achieved by the distributed model is evidently inferior to that of single-gpu one. Is it ok? And does any tricks be utilized for alleviating this performance gap?

DerrickXuNu commented 2 years ago
  1. Learning rate should be the same, and the batch size can be the same. For example, bn=4 will result in 8 actually with two GPUs.
  2. your guess about batch size is correct. However, the learning rate is the same within single or multiple GPUs.
  3. I already forgot the optimal training epoch since it was nearly one year ago. But I usually check the validation loss to see which one has the lowest number and then choose that one as the final epoch. You can refer to the default YAML files for training. I am currently using a multi-step learning rate, but I think adding an annealing learning rate strategy should lead to better performance.

Regarding your question about the performance gap between single and multiple GPU training, I guess you may have some overfitting on the multiple-GPU training. One epoch in 2-GPU training equals two epochs in single GPU training, so check your tf board whether an overfitting is happening

linchunmian commented 2 years ago

Many thanks. But I am still confused about your means:

  1. For one-gpu, the default bs is 2 and initial lr is 0.002 in the config yaml. If I adopt bs=8, should the initial learning rate be enlarge 4 times correspondingly (i.e., lr=0.008)?
  2. For two-gpus, the default bs=2 and lr=0.002 in the signle-gpu setting implies bs parameter is 4 in total, and initial lr retains 0.002 for each gpu, is it? As for the learning rate strategy and performance gap mentioned above, I would further validate it. Thanks, again.
DerrickXuNu commented 2 years ago
  1. No, I didn't see the reason why LR needs to adjust for the num of GPUs. That doesn't make sense to me.
  2. Yes
linchunmian commented 2 years ago

Many thanks. For the first question, I mean does learning rate increase with batch size parameter, not the number of GPUs. One attempt I made is adopting 8 batch size and the same 0.002 initial learning rate in the one-gpu settiing, test ap result is much poor than that of bs=4 and lr=0.002.

DerrickXuNu commented 2 years ago

I don't think LR need to increase with batch size. As I mentioned, the worse results may come from overfitting. You need to do early stopping

linchunmian commented 2 years ago

Thanks. I found the model convergent with 13-15 epoch, is it normal?

DerrickXuNu commented 2 years ago

Is it on multi-GPUs?

linchunmian commented 2 years ago

Both on single- and multi-gpus, training epoch is set to 15 with 2 batch and 0.002 initial learning rate. Later, the pretrained model at 13 or 15 epoch reports the best validation and test ap result. I also enlarge the epoch to 30 under the same parameter setting, but the detection performances on valid and test splits are substantially inferior to that of model with 15 training epoch. Is it strange?

DerrickXuNu commented 2 years ago

What model are you used for training?

linchunmian commented 2 years ago
        I train the model from scratch.---- Replied Message ----FromRunsheng ***@***.***>Date07/11/2022 22:43 ***@***.***> ***@***.******@***.***>SubjectRe: [DerrickXuNu/OpenCOOD] multi-gpus on single machine (Issue #27)

What model are you used for training?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/DerrickXuNu/OpenCOOD/issues/27#issuecomment-1180497844", "url": "https://github.com/DerrickXuNu/OpenCOOD/issues/27#issuecomment-1180497844", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

DerrickXuNu commented 2 years ago

Yeah, but which model did you train? Your own model or some I provided?

linchunmian commented 2 years ago
        I train the point pillar with intermediate fusion from scratch ---- Replied Message ----FromRunsheng ***@***.***>Date07/11/2022 22:56 ***@***.***> ***@***.******@***.***>SubjectRe: [DerrickXuNu/OpenCOOD] multi-gpus on single machine (Issue #27)

Yeah, but which model did you train? Your own model or some I provided?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/DerrickXuNu/OpenCOOD/issues/27#issuecomment-1180516239", "url": "https://github.com/DerrickXuNu/OpenCOOD/issues/27#issuecomment-1180516239", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

linchunmian commented 2 years ago

Also, does the opv2v dataset provide the transformation information between lidar and camera (i.e. lidar->camera and camera->lidar)?

DerrickXuNu commented 2 years ago

I have no idea about this for now. I suggest change to annealing learning stretagy and see whether it will become better.

DerrickXuNu commented 2 years ago

Also, does the opv2v dataset provide the transformation information between lidar and camera (i.e. lidar->camera and camera->lidar)?

Yes, we do have the api. We plan to release it soon (probably next month)

linchunmian commented 2 years ago

Thanks.

  1. so, how many epoch do you adopt when training the pointpillars ?
  2. could you please describe the pipeline of multiple modalities in short? For example, does the camera information from other cavs project to the ego vehicle via transformation parameters? which parameters need to be utilized for inter-image transformation and lidar-camera projection? Thanks in advance!