DongSky / few-shot-vit

vit for few-shot classification
45 stars 5 forks source link

RuntimeError: DataLoader worker (pid 7572) is killed by signal: Killed. #11

Open lgx12345678 opened 2 months ago

lgx12345678 commented 2 months ago

(py38) wl_ligexian@9f9da370a0bd:~/few-shot-vit-main/meta_tuning_sun_d$ python train_meta.py -deepemd grid -patch_list 2,3 -shot 1 -way 5 -solver opencv -gpu 0 -save_all {'backbone': 'visformer', 'bs': 1, 'data_dir': '/public/home/wl_ligexian/few-shot-vit-main/test_phase/materials/', 'dataset': 'miniimagenet', 'deepemd': 'grid', 'extra_dir': None, 'feature_pyramid': None, 'form': 'L2', 'gamma': 0.5, 'gpu': '0', 'l2_strength': 1e-06, 'lr': 0.0005, 'max_epoch': 100, 'metric': 'cosine', 'norm': 'center', 'num_patch': 9, 'patch_list': '2,3', 'patch_ratio': 2, 'pretrain_dir': 'visformer_mini_1shot_ckpt.pth', 'query': 15, 'random_val_task': False, 'save_all': True, 'seed': 12345, 'set': 'val', 'sfc_bs': 4, 'sfc_lr': 0.1, 'sfc_update_step': 100, 'sfc_wd': 0, 'shot': 1, 'solver': 'opencv', 'step_size': 10, 'temperature': 12.5, 'test_episode': 2000, 'val_episode': 2000, 'val_frequency': 50, 'way': 5} manual seed: 12345 use gpu: [0] odict_keys(['encoder.pos_embed1', 'encoder.pos_embed2', 'encoder.pos_embed3', 'encoder.stem.conv1.weight', 'encoder.stem.bn1.weight', 'encoder.stem.bn1.bias', 'encoder.stem.bn1.running_mean', 'encoder.stem.bn1.running_var', 'encoder.stem.bn1.num_batches_tracked', 'encoder.stem.conv2.weight', 'encoder.stem.bn2.weight', 'encoder.stem.bn2.bias', 'encoder.stem.bn2.running_mean', 'encoder.stem.bn2.running_var', 'encoder.stem.bn2.num_batches_tracked', 'encoder.stem.conv3.weight', 'encoder.stem.bn3.weight', 'encoder.stem.bn3.bias', 'encoder.stem.bn3.running_mean', 'encoder.stem.bn3.running_var', 'encoder.stem.bn3.num_batches_tracked', 'encoder.stem.downsample.0.weight', 'encoder.stem.downsample.1.weight', 'encoder.stem.downsample.1.bias', 'encoder.stem.downsample.1.running_mean', 'encoder.stem.downsample.1.running_var', 'encoder.stem.downsample.1.num_batches_tracked', 'encoder.stage1.0.norm2.bn.weight', 'encoder.stage1.0.norm2.bn.bias', 'encoder.stage1.0.norm2.bn.running_mean', 'encoder.stage1.0.norm2.bn.running_var', 'encoder.stage1.0.norm2.bn.num_batches_tracked', 'encoder.stage1.0.mlp.conv1.weight', 'encoder.stage1.0.mlp.conv2.weight', 'encoder.stage1.0.mlp.conv3.weight', 'encoder.stage1.1.norm2.bn.weight', 'encoder.stage1.1.norm2.bn.bias', 'encoder.stage1.1.norm2.bn.running_mean', 'encoder.stage1.1.norm2.bn.running_var', 'encoder.stage1.1.norm2.bn.num_batches_tracked', 'encoder.stage1.1.mlp.conv1.weight', 'encoder.stage1.1.mlp.conv2.weight', 'encoder.stage1.1.mlp.conv3.weight', 'encoder.stage1.2.norm2.bn.weight', 'encoder.stage1.2.norm2.bn.bias', 'encoder.stage1.2.norm2.bn.running_mean', 'encoder.stage1.2.norm2.bn.running_var', 'encoder.stage1.2.norm2.bn.num_batches_tracked', 'encoder.stage1.2.mlp.conv1.weight', 'encoder.stage1.2.mlp.conv2.weight', 'encoder.stage1.2.mlp.conv3.weight', 'encoder.stage1.3.norm2.bn.weight', 'encoder.stage1.3.norm2.bn.bias', 'encoder.stage1.3.norm2.bn.running_mean', 'encoder.stage1.3.norm2.bn.running_var', 'encoder.stage1.3.norm2.bn.num_batches_tracked', 'encoder.stage1.3.mlp.conv1.weight', 'encoder.stage1.3.mlp.conv2.weight', 'encoder.stage1.3.mlp.conv3.weight', 'encoder.patch_embed2.proj.weight', 'encoder.patch_embed2.proj.bias', 'encoder.patch_embed2.norm.bn.weight', 'encoder.patch_embed2.norm.bn.bias', 'encoder.patch_embed2.norm.bn.running_mean', 'encoder.patch_embed2.norm.bn.running_var', 'encoder.patch_embed2.norm.bn.num_batches_tracked', 'encoder.stage2.0.norm1.bn.weight', 'encoder.stage2.0.norm1.bn.bias', 'encoder.stage2.0.norm1.bn.running_mean', 'encoder.stage2.0.norm1.bn.running_var', 'encoder.stage2.0.norm1.bn.num_batches_tracked', 'encoder.stage2.0.attn.qkv.weight', 'encoder.stage2.0.attn.proj.weight', 'encoder.stage2.0.norm2.bn.weight', 'encoder.stage2.0.norm2.bn.bias', 'encoder.stage2.0.norm2.bn.running_mean', 'encoder.stage2.0.norm2.bn.running_var', 'encoder.stage2.0.norm2.bn.num_batches_tracked', 'encoder.stage2.0.mlp.conv1.weight', 'encoder.stage2.0.mlp.conv3.weight', 'encoder.stage2.1.norm1.bn.weight', 'encoder.stage2.1.norm1.bn.bias', 'encoder.stage2.1.norm1.bn.running_mean', 'encoder.stage2.1.norm1.bn.running_var', 'encoder.stage2.1.norm1.bn.num_batches_tracked', 'encoder.stage2.1.attn.qkv.weight', 'encoder.stage2.1.attn.proj.weight', 'encoder.stage2.1.norm2.bn.weight', 'encoder.stage2.1.norm2.bn.bias', 'encoder.stage2.1.norm2.bn.running_mean', 'encoder.stage2.1.norm2.bn.running_var', 'encoder.stage2.1.norm2.bn.num_batches_tracked', 'encoder.stage2.1.mlp.conv1.weight', 'encoder.stage2.1.mlp.conv3.weight', 'encoder.patch_embed3.proj.weight', 'encoder.patch_embed3.proj.bias', 'encoder.patch_embed3.norm.bn.weight', 'encoder.patch_embed3.norm.bn.bias', 'encoder.patch_embed3.norm.bn.running_mean', 'encoder.patch_embed3.norm.bn.running_var', 'encoder.patch_embed3.norm.bn.num_batches_tracked', 'encoder.stage3.0.norm1.bn.weight', 'encoder.stage3.0.norm1.bn.bias', 'encoder.stage3.0.norm1.bn.running_mean', 'encoder.stage3.0.norm1.bn.running_var', 'encoder.stage3.0.norm1.bn.num_batches_tracked', 'encoder.stage3.0.attn.qkv.weight', 'encoder.stage3.0.attn.proj.weight', 'encoder.stage3.0.norm2.bn.weight', 'encoder.stage3.0.norm2.bn.bias', 'encoder.stage3.0.norm2.bn.running_mean', 'encoder.stage3.0.norm2.bn.running_var', 'encoder.stage3.0.norm2.bn.num_batches_tracked', 'encoder.stage3.0.mlp.conv1.weight', 'encoder.stage3.0.mlp.conv3.weight', 'encoder.stage3.1.norm1.bn.weight', 'encoder.stage3.1.norm1.bn.bias', 'encoder.stage3.1.norm1.bn.running_mean', 'encoder.stage3.1.norm1.bn.running_var', 'encoder.stage3.1.norm1.bn.num_batches_tracked', 'encoder.stage3.1.attn.qkv.weight', 'encoder.stage3.1.attn.proj.weight', 'encoder.stage3.1.norm2.bn.weight', 'encoder.stage3.1.norm2.bn.bias', 'encoder.stage3.1.norm2.bn.running_mean', 'encoder.stage3.1.norm2.bn.running_var', 'encoder.stage3.1.norm2.bn.num_batches_tracked', 'encoder.stage3.1.mlp.conv1.weight', 'encoder.stage3.1.mlp.conv3.weight', 'encoder.stage3.2.norm1.bn.weight', 'encoder.stage3.2.norm1.bn.bias', 'encoder.stage3.2.norm1.bn.running_mean', 'encoder.stage3.2.norm1.bn.running_var', 'encoder.stage3.2.norm1.bn.num_batches_tracked', 'encoder.stage3.2.attn.qkv.weight', 'encoder.stage3.2.attn.proj.weight', 'encoder.stage3.2.norm2.bn.weight', 'encoder.stage3.2.norm2.bn.bias', 'encoder.stage3.2.norm2.bn.running_mean', 'encoder.stage3.2.norm2.bn.running_var', 'encoder.stage3.2.norm2.bn.num_batches_tracked', 'encoder.stage3.2.mlp.conv1.weight', 'encoder.stage3.2.mlp.conv3.weight', 'encoder.norm.bn.weight', 'encoder.norm.bn.bias', 'encoder.norm.bn.running_mean', 'encoder.norm.bn.running_var', 'encoder.norm.bn.num_batches_tracked']) loading model from : visformer_mini_1shot_ckpt.pth detect temp variable, delete it odict_keys(['encoder.pos_embed1', 'encoder.pos_embed2', 'encoder.pos_embed3', 'encoder.stem.conv1.weight', 'encoder.stem.bn1.weight', 'encoder.stem.bn1.bias', 'encoder.stem.bn1.running_mean', 'encoder.stem.bn1.running_var', 'encoder.stem.bn1.num_batches_tracked', 'encoder.stem.conv2.weight', 'encoder.stem.bn2.weight', 'encoder.stem.bn2.bias', 'encoder.stem.bn2.running_mean', 'encoder.stem.bn2.running_var', 'encoder.stem.bn2.num_batches_tracked', 'encoder.stem.conv3.weight', 'encoder.stem.bn3.weight', 'encoder.stem.bn3.bias', 'encoder.stem.bn3.running_mean', 'encoder.stem.bn3.running_var', 'encoder.stem.bn3.num_batches_tracked', 'encoder.stem.downsample.0.weight', 'encoder.stem.downsample.1.weight', 'encoder.stem.downsample.1.bias', 'encoder.stem.downsample.1.running_mean', 'encoder.stem.downsample.1.running_var', 'encoder.stem.downsample.1.num_batches_tracked', 'encoder.stage1.0.norm2.bn.weight', 'encoder.stage1.0.norm2.bn.bias', 'encoder.stage1.0.norm2.bn.running_mean', 'encoder.stage1.0.norm2.bn.running_var', 'encoder.stage1.0.norm2.bn.num_batches_tracked', 'encoder.stage1.0.mlp.conv1.weight', 'encoder.stage1.0.mlp.conv2.weight', 'encoder.stage1.0.mlp.conv3.weight', 'encoder.stage1.1.norm2.bn.weight', 'encoder.stage1.1.norm2.bn.bias', 'encoder.stage1.1.norm2.bn.running_mean', 'encoder.stage1.1.norm2.bn.running_var', 'encoder.stage1.1.norm2.bn.num_batches_tracked', 'encoder.stage1.1.mlp.conv1.weight', 'encoder.stage1.1.mlp.conv2.weight', 'encoder.stage1.1.mlp.conv3.weight', 'encoder.stage1.2.norm2.bn.weight', 'encoder.stage1.2.norm2.bn.bias', 'encoder.stage1.2.norm2.bn.running_mean', 'encoder.stage1.2.norm2.bn.running_var', 'encoder.stage1.2.norm2.bn.num_batches_tracked', 'encoder.stage1.2.mlp.conv1.weight', 'encoder.stage1.2.mlp.conv2.weight', 'encoder.stage1.2.mlp.conv3.weight', 'encoder.stage1.3.norm2.bn.weight', 'encoder.stage1.3.norm2.bn.bias', 'encoder.stage1.3.norm2.bn.running_mean', 'encoder.stage1.3.norm2.bn.running_var', 'encoder.stage1.3.norm2.bn.num_batches_tracked', 'encoder.stage1.3.mlp.conv1.weight', 'encoder.stage1.3.mlp.conv2.weight', 'encoder.stage1.3.mlp.conv3.weight', 'encoder.patch_embed2.proj.weight', 'encoder.patch_embed2.proj.bias', 'encoder.patch_embed2.norm.bn.weight', 'encoder.patch_embed2.norm.bn.bias', 'encoder.patch_embed2.norm.bn.running_mean', 'encoder.patch_embed2.norm.bn.running_var', 'encoder.patch_embed2.norm.bn.num_batches_tracked', 'encoder.stage2.0.norm1.bn.weight', 'encoder.stage2.0.norm1.bn.bias', 'encoder.stage2.0.norm1.bn.running_mean', 'encoder.stage2.0.norm1.bn.running_var', 'encoder.stage2.0.norm1.bn.num_batches_tracked', 'encoder.stage2.0.attn.qkv.weight', 'encoder.stage2.0.attn.proj.weight', 'encoder.stage2.0.norm2.bn.weight', 'encoder.stage2.0.norm2.bn.bias', 'encoder.stage2.0.norm2.bn.running_mean', 'encoder.stage2.0.norm2.bn.running_var', 'encoder.stage2.0.norm2.bn.num_batches_tracked', 'encoder.stage2.0.mlp.conv1.weight', 'encoder.stage2.0.mlp.conv3.weight', 'encoder.stage2.1.norm1.bn.weight', 'encoder.stage2.1.norm1.bn.bias', 'encoder.stage2.1.norm1.bn.running_mean', 'encoder.stage2.1.norm1.bn.running_var', 'encoder.stage2.1.norm1.bn.num_batches_tracked', 'encoder.stage2.1.attn.qkv.weight', 'encoder.stage2.1.attn.proj.weight', 'encoder.stage2.1.norm2.bn.weight', 'encoder.stage2.1.norm2.bn.bias', 'encoder.stage2.1.norm2.bn.running_mean', 'encoder.stage2.1.norm2.bn.running_var', 'encoder.stage2.1.norm2.bn.num_batches_tracked', 'encoder.stage2.1.mlp.conv1.weight', 'encoder.stage2.1.mlp.conv3.weight', 'encoder.patch_embed3.proj.weight', 'encoder.patch_embed3.proj.bias', 'encoder.patch_embed3.norm.bn.weight', 'encoder.patch_embed3.norm.bn.bias', 'encoder.patch_embed3.norm.bn.running_mean', 'encoder.patch_embed3.norm.bn.running_var', 'encoder.patch_embed3.norm.bn.num_batches_tracked', 'encoder.stage3.0.norm1.bn.weight', 'encoder.stage3.0.norm1.bn.bias', 'encoder.stage3.0.norm1.bn.running_mean', 'encoder.stage3.0.norm1.bn.running_var', 'encoder.stage3.0.norm1.bn.num_batches_tracked', 'encoder.stage3.0.attn.qkv.weight', 'encoder.stage3.0.attn.proj.weight', 'encoder.stage3.0.norm2.bn.weight', 'encoder.stage3.0.norm2.bn.bias', 'encoder.stage3.0.norm2.bn.running_mean', 'encoder.stage3.0.norm2.bn.running_var', 'encoder.stage3.0.norm2.bn.num_batches_tracked', 'encoder.stage3.0.mlp.conv1.weight', 'encoder.stage3.0.mlp.conv3.weight', 'encoder.stage3.1.norm1.bn.weight', 'encoder.stage3.1.norm1.bn.bias', 'encoder.stage3.1.norm1.bn.running_mean', 'encoder.stage3.1.norm1.bn.running_var', 'encoder.stage3.1.norm1.bn.num_batches_tracked', 'encoder.stage3.1.attn.qkv.weight', 'encoder.stage3.1.attn.proj.weight', 'encoder.stage3.1.norm2.bn.weight', 'encoder.stage3.1.norm2.bn.bias', 'encoder.stage3.1.norm2.bn.running_mean', 'encoder.stage3.1.norm2.bn.running_var', 'encoder.stage3.1.norm2.bn.num_batches_tracked', 'encoder.stage3.1.mlp.conv1.weight', 'encoder.stage3.1.mlp.conv3.weight', 'encoder.stage3.2.norm1.bn.weight', 'encoder.stage3.2.norm1.bn.bias', 'encoder.stage3.2.norm1.bn.running_mean', 'encoder.stage3.2.norm1.bn.running_var', 'encoder.stage3.2.norm1.bn.num_batches_tracked', 'encoder.stage3.2.attn.qkv.weight', 'encoder.stage3.2.attn.proj.weight', 'encoder.stage3.2.norm2.bn.weight', 'encoder.stage3.2.norm2.bn.bias', 'encoder.stage3.2.norm2.bn.running_mean', 'encoder.stage3.2.norm2.bn.running_var', 'encoder.stage3.2.norm2.bn.num_batches_tracked', 'encoder.stage3.2.mlp.conv1.weight', 'encoder.stage3.2.mlp.conv3.weight', 'encoder.norm.bn.weight', 'encoder.norm.bn.bias', 'encoder.norm.bn.running_mean', 'encoder.norm.bn.running_var', 'encoder.norm.bn.num_batches_tracked']) /public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 3, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( fix val set for all epochs

Traceback (most recent call last): File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/queue.py", line 179, in get self.not_empty.wait(remaining) File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/threading.py", line 306, in wait gotit = waiter.acquire(True, timeout) File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 7572) is killed by signal: Killed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "train_meta.py", line 119, in val_loader=[x for x in val_loader] File "train_meta.py", line 119, in val_loader=[x for x in val_loader] File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in next data = self._next_data() File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1329, in _next_data idx, data = self._get_data() File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1285, in _get_data success, data = self._try_get_data() File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1146, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 7572, 7629) exited unexpectedly terminate called without an active exception Aborted