During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./tools/train.py", line 270, in <module>
main()
File "./tools/train.py", line 233, in main
datasets = [build_dataset(cfg.data.train)]
File "/home/zhangyiqing/mmdetection3d/mmdet3d/datasets/builder.py", line 46, in build_dataset
dataset = build_from_cfg(cfg, MMDET_DATASETS, default_args)
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/mmcv/utils/registry.py", line 69, in build_from_cfg
raise type(e)(f'{obj_cls.__name__}: {e}')
FileNotFoundError: OpenLaneV2SubsetADataset: [Errno 2] No such file or directory: './data/data_dict_subset_A_train.pkl'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 3 (pid: 2472564) of binary: /home/zhangyiqing/miniconda3/envs/openlanev2/bin/python
/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py:367: UserWarning:
**********************************************************************
CHILD PROCESS FAILED WITH NO ERROR_FILE
**********************************************************************
CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 2472564 (local_rank 3) FAILED (exitcode 1)
Error msg: Process failed with exitcode 1
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:
from torch.distributed.elastic.multiprocessing.errors import record
@record
def trainer_main(args):
# do train
**********************************************************************
warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in <module>
main()
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
return f(*args, **kwargs)
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main
run(args)
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/zhangyiqing/miniconda3/envs/openlanev2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
***************************************
./tools/train.py FAILED
It seems a file, named"data_dict_subset_A_train.pkl", missed. Look forward to your reply! Thank you!
It looks like you are using TopoMLP code. Please try to run preprocess in your OpenLane-V2 repo to generate data_dict_subset_A_train.pkl. And then link the folder to TopoMLP repo.
Thank you for your great job! When I run the code, some errors occurs:
Command:
./tools/dist_train.sh projects/configs/topomlp_setA_r50_wo_yolov8.py 8 --work-dir=./work_dirs/topomlp_setA_r50_wo_yolov8
Error:
It seems a file, named"data_dict_subset_A_train.pkl", missed. Look forward to your reply! Thank you!