V2AI / Det3D

World's first general purpose 3D object detection codebse.
https://arxiv.org/abs/1908.09492
Apache License 2.0
1.5k stars 298 forks source link

No state_dict found in checkpoint file model_epoch_025_step_100100.pth #117

Closed pjohh closed 4 years ago

pjohh commented 4 years ago

Hello,

  1. Set up everything according to Installation and Getting Started for NuScenes trainval with only diffs:

    • OS: Ubuntu 18.04
    • Python: 3.6.9
    • PyTorch: 1.4
    • CUDA: 10.2
    • CUDNN: 7.6.5
  2. Testing for NuScenes trainval with provided checkpoint of cbgs and --nproc_per_node=1 (single gpu setup) :

    ./tools/scripts/test.sh examples/cbgs/configs/nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py test/ /home/Downloads/model_epoch_025_step_100100.pth
  3. Result:

    2020-06-14 12:54:22,219 - INFO - Distributed testing: False
    2020-06-14 12:54:22,219 - INFO - torch.backends.cudnn.benchmark: False
    2020-06-14 12:54:22,298 - INFO - Finish RPN Initialization
    2020-06-14 12:54:22,299 - INFO - num_classes: [1, 2, 2, 1, 2, 2], num_preds: [18, 36, 36, 18, 36, 36], num_dirs: [4, 8, 8, 4, 8, 8]
    2020-06-14 12:54:22,302 - INFO - Finish MultiGroupHead Initialization
    Traceback (most recent call last):
    File "./tools/dist_test.py", line 187, in <module>
    main()
    File "./tools/dist_test.py", line 106, in main
    checkpoint = load_checkpoint(model, args.checkpoint, map_location="cpu")
    File "/home/workspace/Det3D/det3d/torchie/trainer/checkpoint.py", line 163, in load_checkpoint
    raise RuntimeError("No state_dict found in checkpoint file {}".format(filename))
    RuntimeError: No state_dict found in checkpoint file /home/Downloads/model_epoch_025_step_100100.pth
    Traceback (most recent call last):
    File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
    File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
    File "/home/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
    File "/home/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
    subprocess.CalledProcessError: Command '['/usr/bin/python3.6', '-u', './tools/dist_test.py', '--local_rank=0', 'examples/cbgs/configs/nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py', '--work_dir=test/', '--checkpoint=/home/Downloads/model_epoch_025_step_100100.pth']' returned non-zero exit status 1.

    Running the test without torch.distributed.launch gives the same result:

    python3.6 tools/dist_test.py examples/cbgs/configs/nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir test/ --checkpoint /home/Downloads/model_epoch_025_step_100100.pth 
    2020-06-14 12:58:52,101 - INFO - Distributed testing: False
    2020-06-14 12:58:52,101 - INFO - torch.backends.cudnn.benchmark: False
    2020-06-14 12:58:52,177 - INFO - Finish RPN Initialization
    2020-06-14 12:58:52,178 - INFO - num_classes: [1, 2, 2, 1, 2, 2], num_preds: [18, 36, 36, 18, 36, 36], num_dirs: [4, 8, 8, 4, 8, 8]
    2020-06-14 12:58:52,181 - INFO - Finish MultiGroupHead Initialization
    Traceback (most recent call last):
    File "tools/dist_test.py", line 187, in <module>
    main()
    File "tools/dist_test.py", line 106, in main
    checkpoint = load_checkpoint(model, args.checkpoint, map_location="cpu")
    File "/home/pjoh/workspace/Det3D/det3d/torchie/trainer/checkpoint.py", line 163, in load_checkpoint
    raise RuntimeError("No state_dict found in checkpoint file {}".format(filename))
    RuntimeError: No state_dict found in checkpoint file /home/Downloads/model_epoch_025_step_100100.pth

    Is the provided checkpoint faulty or not compatible with the current version of Det3D or am I doing something wrong?

poodarchu commented 4 years ago

The provided ckpt is from my old version of det3d, so it cannot be used directly, but it could be used after some necessary rename .

pjohh commented 4 years ago

@poodarchu thanks for your reply. Could you please point me to a version of Det3D with which the checkpoint is working or do the renaming or explain how and what I have to rename?

Thanks in advance for your help!

poodarchu commented 4 years ago

I havn't released the source code of my old version

pjohh commented 4 years ago

So can you please tell me how to get it working with the current version of Det3D?