Meituan-AutoML / Twins

Two simple and effective designs of vision transformer, which is on par with the Swin transformer
Apache License 2.0
578 stars 69 forks source link

Can we train or test on single GPU in detection sections? #24

Open nestor0003 opened 2 years ago

nestor0003 commented 2 years ago

If we want to test detection task, or just use the shell code like 'bash dist_test.sh configs/retinanet_alt_gvt_s_fpn_1x_coco_pvt_setting.py checkpoint_file 1 --eval mAP' ?

Or change the lr? and the number of the worker ? I'm a beginner of the mmdet framework, please help... this is the error lines:

/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases. Please read local_rank from os.environ('LOCAL_RANK') instead. INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : ./test.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_o5bp99y9/none_u2fqutod INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_0/0/error.json loading annotations into memory... Done (t=0.52s) creating index... index created! Traceback (most recent call last): File "./test.py", line 213, in main() File "./test.py", line 166, in main model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 67, in build_detector return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build return build_from_cfg(cfg, registry, default_args) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg return obj_cls(args) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/retinanet.py", line 16, in init super(RetinaNet, self).init(backbone, neck, bbox_head, train_cfg, File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 25, in init self.backbone = build_backbone(backbone) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 37, in build_backbone return build(cfg, BACKBONES) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build return build_from_cfg(cfg, registry, default_args) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg return obj_cls(args) File "/home/user/project/Twins/detection/gvt.py", line 482, in init super(alt_gvt_small, self).init( File "/home/user/project/Twins/detection/gvt.py", line 419, in init super(ALTGVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, File "/home/user/project/Twins/detection/gvt.py", line 408, in init super(PCPVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, File "/home/user/project/Twins/detection/gvt.py", line 343, in init super(CPVTV2, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, mlp_ratios, File "/home/user/project/Twins/detection/gvt.py", line 234, in init _block = nn.ModuleList([block_cls( File "/home/user/project/Twins/detection/gvt.py", line 234, in _block = nn.ModuleList([block_cls( File "/home/user/project/Twins/detection/gvt.py", line 164, in init super(GroupBlock, self).init(dim, num_heads, mlp_ratio, qkv_bias, qk_scale, drop, attn_drop, TypeError: init() takes from 3 to 10 positional arguments but 11 were given ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11449) of binary: /home/user/miniconda3/envs/twins/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=1 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_1/0/error.json

cxxgtxy commented 2 years ago

We suggest using at least 4 GPUs to train.

We still have some intern offers. If you are interested, please send your CV to chuxiangxiang@meituan.com. Thanks