OpenDriveLab / TopoNet

Topology Reasoning for Scene Perception in Autonomous Driving
Apache License 2.0
280 stars 11 forks source link

subset-B val performance of ckpt #13

Closed ZhouYC-X closed 9 months ago

ZhouYC-X commented 9 months ago

Thank you for your great work. Recently, I downloaded the ckpt file for subsetB and evaluated it on the subsetB-val. However, the performance is significantly different from what is provided in repo.

OpenLane-V2 Score - 0.19636668752454212 DET_l - 0.13499410450458527 DET_t - 0.1379755288362503 TOP_ll - 0.043555243510227305 TOP_lt - 0.09229333809205244 F-Score for 3D Lane - 0.10671664825889586 {'OpenLane-V2 Score': 0.19636668752454212, 'DET_l': 0.1349941, 'DET_t': 0.13797553, 'TOP_ll': 0.043555243510227305, 'TOP_lt': 0.09229333809205244}

I checked the relevant dependencies for evaluation and have not found the reason. Could you please provide some advice? Looking forward for your reply. Thank you

openlanev2 1.1.0 mmcv-full 1.5.2 mmdet 2.26.0 mmdet3d 1.0.0rc6 mmsegmentation 0.29.1 torch 1.9.1+cu111 torchvision 0.10.1+cu111

sephyli commented 9 months ago

Sorry, I failed to reproduce this situation.

OpenLane-V2 Score - 0.35960268760767244
    DET_l - 0.2435305118560791
    DET_t - 0.5265672206878662
    TOP_ll - 0.06714055449349493
    TOP_lt - 0.1674430632156414
F-Score for 3D Lane - 0.13894095268823822
{'OpenLane-V2 Score': 0.35960268760767244, 'DET_l': 0.24353051, 'DET_t': 0.5265672, 'TOP_ll': 0.06714055449349493, 'TOP_lt': 0.1674430632156414}

Can you provide the full log of the evaluation procedure?

ZhouYC-X commented 9 months ago

Thank you for your prompt response.

This is the log of the first evaluation failure.

WARNING:__main__:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
[                                                  ] 0/6019, elapsed: 0s, ETA:/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)

[                              ] 1/6019, 0.0 task/s, elapsed: 72s, ETA: 435457s
[                              ] 2/6019, 0.0 task/s, elapsed: 72s, ETA: 217694s
[                              ] 3/6019, 0.0 task/s, elapsed: 72s, ETA: 145105s
[                              ] 4/6019, 0.1 task/s, elapsed: 72s, ETA: 108811s/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
[                               ] 5/6019, 0.1 task/s, elapsed: 73s, ETA: 87223s
[                               ] 6/6019, 0.1 task/s, elapsed: 73s, ETA: 72674s
[                               ] 7/6019, 0.1 task/s, elapsed: 73s, ETA: 62281s
[                               ] 8/6019, 0.1 task/s, elapsed: 73s, ETA: 54487s
[                               ] 9/6019, 0.1 task/s, elapsed: 73s, ETA: 48508s
[                              ] 10/6019, 0.1 task/s, elapsed: 73s, ETA: 43650s
[                              ] 11/6019, 0.2 task/s, elapsed: 73s, ETA: 39675s
[                              ] 12/6019, 0.2 task/s, elapsed: 73s, ETA: 36363s/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)

[                              ] 13/6019, 0.2 task/s, elapsed: 73s, ETA: 33621s
[                              ] 14/6019, 0.2 task/s, elapsed: 73s, ETA: 31214s

[>>>>>>>>>>>>>>>>>>>>>>>>> ] 6014/6019, 25.3 task/s, elapsed: 238s, ETA:     0s
[>>>>>>>>>>>>>>>>>>>>>>>>> ] 6015/6019, 25.3 task/s, elapsed: 238s, ETA:     0s
[>>>>>>>>>>>>>>>>>>>>>>>>> ] 6016/6019, 25.3 task/s, elapsed: 238s, ETA:     0s2024-01-23 14:31:20,166 - mmdet - INFO - Starting format results...
2024-01-23 14:39:55,273 - mmdet - INFO - Starting openlanev2 evaluate...
Traceback (most recent call last):
  File "tools/test.py", line 266, in <module>
    main()
  File "tools/test.py", line 262, in main
    print(dataset.evaluate(outputs, **eval_kwargs))
  File "/code/TopoLane/TopoNet-main/projects/toponet/datasets/openlanev2_subset_A_dataset.py", line 364, in evaluate
    metric_results = openlanev2_evaluate(gt_dict, pred_dict)
  File "/code/TopoLane/OpenLane-V2/openlanev2/evaluation/evaluate.py", line 561, in evaluate
    preds[token] = predictions[token]['predictions']
KeyError: ('val', '11149', '1542799760912460')
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 105818) of binary: /opt/anaconda3/bin/python
/opt/anaconda3/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py:367: UserWarning: 

**********************************************************************
               CHILD PROCESS FAILED WITH NO ERROR_FILE                
**********************************************************************
CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 105818 (local_rank 0) FAILED (exitcode 1)
Error msg: Process failed with exitcode 1
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:

  from torch.distributed.elastic.multiprocessing.errors import record

  @record
  def trainer_main(args):
     # do train
**********************************************************************
  warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/anaconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in <module>
    main()
  File "/opt/anaconda3/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
    return f(*args, **kwargs)
  File "/opt/anaconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main
    run(args)
  File "/opt/anaconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
    elastic_launch(
  File "/opt/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
***************************************
          tools/test.py FAILED         
=======================================
Root Cause:
[0]:
  time: 2024-01-23_14:40:04
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 105818)
  error_file: <N/A>
  msg: "Process failed with exitcode 1"
=======================================
Other Failures:
  <NO_OTHER_FAILURES>
***************************************

According to the error message in the log, I added the following lines of code in OpenLane-V2/openlanev2/evaluation/evaluate.py

for token in ground_truth.keys():
        if token not in predictions.keys():  # to fix line 561 error
            continue
        gts[token] = ground_truth[token]['annotation']py

Resulting in the second evaluation log as follows.

WARNING:__main__:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
load checkpoint from local path: ckpt/toponet_r50_8x1_24e_olv2_subset_B.pth
[                                                  ] 0/6019, elapsed: 0s, ETA:/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
/opt/anaconda3/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)

[                              ] 1/6019, 0.0 task/s, elapsed: 70s, ETA: 421610s
[                              ] 2/6019, 0.0 task/s, elapsed: 70s, ETA: 210771s

[>>>>>>>>>>>>>>>>>>>>>>>>> ] 6015/6019, 25.8 task/s, elapsed: 233s, ETA:     0s
[>>>>>>>>>>>>>>>>>>>>>>>>> ] 6016/6019, 25.8 task/s, elapsed: 233s, ETA:     0s2024-01-23 14:53:24,340 - mmdet - INFO - Starting format results...
2024-01-23 15:01:46,309 - mmdet - INFO - Starting openlanev2 evaluate...
len(gkeys):6019, len(pkeys):6016
len(gts.keys()):6016, len(preds.keys()):6016

calculating distances::   0%|                          | 0/6016 [00:00<?, ?it/s]
calculating distances::   0%|                  | 1/6016 [00:00<28:39,  3.50it/s]

calculating distances:: 100%|███████████████| 6016/6016 [26:23<00:00,  3.54it/s]
calculating distances:: 100%|███████████████| 6016/6016 [26:23<00:00,  3.80it/s]
/opt/anaconda3/lib/python3.8/site-packages/scipy/interpolate/_interpolate.py:641: RuntimeWarning: divide by zero encountered in true_divide
  slope = (y_hi - y_lo) / (x_hi - x_lo)[:, None]
/opt/anaconda3/lib/python3.8/site-packages/scipy/interpolate/_interpolate.py:641: RuntimeWarning: invalid value encountered in true_divide
  slope = (y_hi - y_lo) / (x_hi - x_lo)[:, None]
OpenLane-V2 Score - 0.19636668752454212
    DET_l - 0.13499410450458527
    DET_t - 0.1379755288362503
    TOP_ll - 0.043555243510227305
    TOP_lt - 0.09229333809205244
F-Score for 3D Lane - 0.10671664825889586
{'OpenLane-V2 Score': 0.19636668752454212, 'DET_l': 0.1349941, 'DET_t': 0.13797553, 'TOP_ll': 0.043555243510227305, 'TOP_lt': 0.09229333809205244}

I downloaded the OpenLanev2 data from opendatalab and set up the environment according to the TopoNet and openlanev2 repositories. Further, I just ran subset_A val, and there is also a performance difference, which is indeed very strange.

I wonder if you could provide your conda environment information so that I can reconfigure and run the tests again?

sephyli commented 9 months ago

File "/code/TopoLane/TopoNet-main/projects/toponet/datasets/openlanev2_subset_A_dataset.py", line 364 I notice that you are using subset-A dataset to eval a model trained on subset-B. Please use the config and code we provided to reprodecu the results.

ZhouYC-X commented 9 months ago

I check the fisrt failure log and use the correct config files(actually, I modify the log to hide some personal information)

File "/code/TopoLane/TopoNet-main/projects/toponet/datasets/openlanev2_subset_A_dataset.py", line 364

I re-setup a clean environment and achieve the results in repo. I guess there might be conflicts with some dependent packages or multiple installations of openlanev2 that caused the issue.

Thank you for your time.