Megvii-BaseDetection / DenseTeacher

DenseTeacher: Dense Pseudo-Label for Semi-supervised Object Detection
Apache License 2.0
120 stars 12 forks source link

An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (5890), thread 'MainThread' (139632170628928): #16

Closed liuhaolinwen closed 1 year ago

liuhaolinwen commented 1 year ago

when training the model, i am getting an error, this is the log。

2022-10-11 01:04:28 | INFO | runner:89 - EMA model built! 2022-10-11 01:04:28 | INFO | cvpods.checkpoint.checkpoint:111 - Loading checkpoint from detectron2://ImageNetPretrained/MSRA/R-50.pkl 2022-10-11 01:04:28 | ERROR | cvpods.engine.launch:96 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (5890), thread 'MainThread' (139632170628928): Traceback (most recent call last):

File "", line 1, in File "/sdb_dir/liuhaolin/anaconda3/envs/denseteacher/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) │ │ └ 4 │ └ 20 └ <function _main at 0x7efea4940310> File "/sdb_dir/liuhaolin/anaconda3/envs/denseteacher/lib/python3.8/multiprocessing/spawn.py", line 129, in _main return self._bootstrap(parent_sentinel) │ │ └ 4 │ └ <function BaseProcess._bootstrap at 0x7efea4b16790> └ File "/sdb_dir/liuhaolin/anaconda3/envs/denseteacher/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() │ └ <function BaseProcess.run at 0x7efea4b2bdc0> └ File "/sdb_dir/liuhaolin/anaconda3/envs/denseteacher/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ │ │ │ └ (<function _distributed_worker at 0x7efddc98ac10>, 0, (<function main at 0x7efddc929ca0>, 2, 2, 0, 'tcp://127.0.0.1:50162', (... │ │ └ │ └ <function _wrap at 0x7efdf474b8b0> └ File "/sdb_dir/liuhaolin/anaconda3/envs/denseteacher/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) │ │ └ (<function main at 0x7efddc929ca0>, 2, 2, 0, 'tcp://127.0.0.1:50162', (Namespace(clearml=False, dir='.', dist_url='tcp://127.... │ └ 0 └ <function _distributed_worker at 0x7efddc98ac10>

File "/home/liuhaolin/DenseTeacher/cvpods/cvpods/engine/launch.py", line 96, in _distributed_worker main_func(*args) │ └ (Namespace(clearml=False, dir='.', dist_url='tcp://127.0.0.1:50162', eval_only=False, machine_rank=0, num_gpus=2, num_machine... └ <function main at 0x7efddc929ca0>

File "/home/liuhaolin/DenseTeacher/cvpods/tools/train_net.py", line 89, in main runner.resume_or_load(resume=args.resume) │ │ │ └ False │ │ └ Namespace(clearml=False, dir='.', dist_url='tcp://127.0.0.1:50162', eval_only=False, machine_rank=0, num_gpus=2, num_machines... │ └ <function SemiRunner.resume_or_load at 0x7efddc938d30> └ <runner.SemiRunner object at 0x7efddc8d8fa0>

File "./runner.py", line 167, in resume_or_load self.start_iter = (self.checkpointer.resume_or_load( │ │ │ │ └ <function Checkpointer.resume_or_load at 0x7efdf3537c10> │ │ │ └ <cvpods.checkpoint.checkpoint.DefaultCheckpointer object at 0x7efcad81b160> │ │ └ <runner.SemiRunner object at 0x7efddc8d8fa0> │ └ 0 └ <runner.SemiRunner object at 0x7efddc8d8fa0>

File "/home/liuhaolin/DenseTeacher/cvpods/cvpods/checkpoint/checkpoint.py", line 177, in resume_or_load return self.load(path) │ │ └ 'detectron2://ImageNetPretrained/MSRA/R-50.pkl' │ └ <function Checkpointer.load at 0x7efdf35379d0> └ <cvpods.checkpoint.checkpoint.DefaultCheckpointer object at 0x7efcad81b160>

File "/home/liuhaolin/DenseTeacher/cvpods/cvpods/checkpoint/checkpoint.py", line 112, in load assert megfile.smart_isfile(path), "Checkpoint {} not found!".format(path) │ │ │ └ 'detectron2://ImageNetPretrained/MSRA/R-50.pkl' │ │ └ 'detectron2://ImageNetPretrained/MSRA/R-50.pkl' │ └ <function smart_isfile at 0x7efdf358e8b0> └ <module 'megfile' from '/home/liuhaolin/.local/lib/python3.8/site-packages/megfile/init.py'>

File "/home/liuhaolin/.local/lib/python3.8/site-packages/megfile/smart.py", line 100, in smart_isfile return SmartPath(path).is_file(followlinks=followlinks) │ │ └ False │ └ 'detectron2://ImageNetPretrained/MSRA/R-50.pkl' └ <class 'megfile.smart_path.SmartPath'> File "/home/liuhaolin/.local/lib/python3.8/site-packages/megfile/smart_path.py", line 14, in smart_method return getattr(self.pathlike, name)(*args, **kwargs) │ │ │ │ └ {'followlinks': False} │ │ │ └ () │ │ └ 'is_file' │ └ Detectron2Path('ImageNetPretrained/MSRA/R-50.pkl') └ SmartPath('detectron2://ImageNetPretrained/MSRA/R-50.pkl')

TypeError: is_file() got an unexpected keyword argument 'followlinks'

ZRandomize commented 1 year ago

That's due to megfile upgrade, I'll commit the error, please use megfile==0.1.2 first

liuhaolinwen commented 1 year ago

Great,it solved my problem

MRNEVERMISS commented 1 year ago

pip install -i http://pypi.douban.com/simple/ --trusted-host=pypi.douban.com/simple megfile==0.1.2