Open CachCheng opened 4 months ago
[2024-06-27 16:04:18 internimage_b_1k_224](main.py 354): INFO Max accuracy: 99.56% [2024-06-27 16:04:19 internimage_b_1k_224](main.py 561): INFO Test: [0/25] Time 0.757 (0.757) Loss 8.1779 (8.1779) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 20310MB [2024-06-27 16:04:22 internimage_b_1k_224](main.py 561): INFO Test: [10/25] Time 0.308 (0.349) Loss 8.1767 (8.2419) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 20310MB [2024-06-27 16:04:25 internimage_b_1k_224](main.py 561): INFO Test: [20/25] Time 0.308 (0.329) Loss 8.4152 (8.3329) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 20310MB [2024-06-27 16:04:27 internimage_b_1k_224](main.py 568): INFO [Epoch:6] * Acc@1 0.000 Acc@5 0.000 [2024-06-27 16:04:27 internimage_b_1k_224](main.py 359): INFO Accuracy of the ema network on the 1600 test images: 0.0% [2024-06-27 16:04:27 internimage_b_1k_224](main.py 374): INFO Max ema accuracy: 0.00% [2024-06-27 16:04:28 internimage_b_1k_224](main.py 506): INFO Train: [7/300][0/100] eta 0:01:55 lr 0.000022 time 1.1594 (1.1594) model_time 0.6929 (0.6929) loss 1.5148 (1.5148) grad_norm 6.1735 (6.1735/0.0000) mem 20310MB [2024-06-27 16:04:35 internimage_b_1k_224](main.py 506): INFO Train: [7/300][10/100] eta 0:01:06 lr 0.000022 time 0.6905 (0.7409) model_time 0.6904 (0.6983) loss 1.2380 (1.4515) grad_norm 6.4725 (6.8944/2.1804) mem 20310MB [2024-06-27 16:04:42 internimage_b_1k_224](main.py 506): INFO Train: [7/300][20/100] eta 0:00:57 lr 0.000023 time 0.6914 (0.7173) model_time 0.6912 (0.6949) loss 1.5632 (1.4158) grad_norm 5.4286 (6.2839/2.3783) mem 20310MB
训练的时候为什么精度在INFO Max accuracy: 99.56%,但是Max ema accuracy: 0.00%,是怎么回事呢?
(llm) ahs@ahs-SYS-4029GP-TRTC-ZY001:/meta/cash/llm/InternImage/classification$ python export.py --model_name internimage_b_1k_224 --ckpt_dir output/internimage_b_1k_224 --trt => merge config from ./configs/internimage_b_1k_224.yaml using core type: DCNv3 using activation layer: GELU using main norm layer: LN using dpr: linear, 0.5 level2_post_norm: False level2_post_norm_block_ids: None res_post_norm: False remove_center: False ============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 ============= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
Traceback (most recent call last):
File "/meta/cash/llm/InternImage/classification/export.py", line 121, in
Traceback (most recent call last): File "/meta/cash/llm/InternImage/classification/main.py", line 661, in
main(config)
File "/meta/cash/llm/InternImage/classification/main.py", line 275, in main
acc1, acc5, loss = validate(config, data_loader_val, model)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/meta/cash/llm/InternImage/classification/main.py", line 531, in validate
for idx, (images, target) in enumerate(data_loader):
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/meta/cash/llm/InternImage/classification/dataset/cached_image_folder.py", line 323, in getitem
img = self.transform(img)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 95, in call
img = t(img)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 137, in call
return F.to_tensor(pic)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 166, in to_tensor
img = torch.from_numpy(np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True))
RuntimeError: Numpy is not available