Open Edwardmark opened 4 years ago
我也有同样的问题,比自带的Dataset慢,请问你后来解决了吗?
这个 scripts 是很早以前(~ torch 0.4)的遗产了,不知道现在 dataloader 有没有什么改动,我来测试下看看。
我一开始的时候把预处理的特征存进去了,特征太大了所以加载的时候慢,后来把单张图片的buffer存进去就很快
@Lyken17 Hi, I first tried using pytorch-1.10(cuda10.2, python3.8) on one GPU(1080ti), it is too slow, the log is as follows
Epoch: [0][0/10010] Time 15.811 (15.811) Data 14.544 (14.544) Loss 7.0312 (7.0312) Acc@1 0.000 (0.000) Acc@5 0.781 (0.781)
Epoch: [0][10/10010] Time 0.213 (4.024) Data 0.000 (3.770) Loss 7.3495 (7.1619) Acc@1 0.000 (0.213) Acc@5 0.000 (0.497)
Epoch: [0][20/10010] Time 10.217 (4.017) Data 10.129 (3.817) Loss 7.2931 (7.2333) Acc@1 0.000 (0.223) Acc@5 0.000 (0.595)
Epoch: [0][30/10010] Time 0.213 (3.740) Data 0.000 (3.556) Loss 7.0012 (7.1996) Acc@1 0.000 (0.176) Acc@5 0.000 (0.580)
Epoch: [0][40/10010] Time 8.978 (3.729) Data 8.890 (3.556) Loss 7.0080 (7.1619) Acc@1 0.781 (0.210) Acc@5 0.781 (0.534)
Epoch: [0][50/10010] Time 0.220 (3.661) Data 0.000 (3.492) Loss 6.9565 (7.1282) Acc@1 0.000 (0.199) Acc@5 0.000 (0.597)
Epoch: [0][60/10010] Time 7.797 (3.635) Data 7.710 (3.471) Loss 6.9137 (7.0951) Acc@1 0.000 (0.218) Acc@5 0.000 (0.602)
Epoch: [0][70/10010] Time 0.214 (3.665) Data 0.000 (3.503) Loss 6.9065 (7.0728) Acc@1 0.000 (0.220) Acc@5 0.000 (0.572)
Epoch: [0][80/10010] Time 7.347 (3.636) Data 7.260 (3.477) Loss 6.8719 (7.0524) Acc@1 0.000 (0.212) Acc@5 0.781 (0.637)
Epoch: [0][90/10010] Time 0.216 (3.590) Data 0.000 (3.431) Loss 6.9107 (7.0356) Acc@1 0.000 (0.206) Acc@5 0.781 (0.687)
Epoch: [0][100/10010] Time 9.313 (3.629) Data 9.219 (3.473) Loss 6.9006 (7.0217) Acc@1 0.000 (0.217) Acc@5 0.000 (0.696)
Epoch: [0][110/10010] Time 0.212 (3.577) Data 0.000 (3.421) Loss 6.8484 (7.0093) Acc@1 0.000 (0.211) Acc@5 2.344 (0.739)
Epoch: [0][120/10010] Time 11.809 (3.600) Data 11.722 (3.445) Loss 6.8965 (6.9977) Acc@1 0.781 (0.213) Acc@5 1.562 (0.781)
Epoch: [0][130/10010] Time 0.215 (3.534) Data 0.000 (3.379) Loss 6.8403 (6.9883) Acc@1 0.000 (0.209) Acc@5 0.000 (0.805)
Epoch: [0][140/10010] Time 11.093 (3.551) Data 11.000 (3.400) Loss 6.9016 (6.9800) Acc@1 0.000 (0.199) Acc@5 0.000 (0.803)
Epoch: [0][150/10010] Time 4.364 (3.523) Data 4.276 (3.373) Loss 6.8721 (6.9722) Acc@1 0.000 (0.191) Acc@5 0.000 (0.771)
Epoch: [0][160/10010] Time 9.092 (3.525) Data 9.004 (3.375) Loss 6.8635 (6.9640) Acc@1 0.000 (0.199) Acc@5 0.781 (0.791)
Epoch: [0][170/10010] Time 5.724 (3.507) Data 5.637 (3.359) Loss 6.8689 (6.9573) Acc@1 0.000 (0.201) Acc@5 0.781 (0.777)
Epoch: [0][180/10010] Time 9.218 (3.506) Data 9.124 (3.360) Loss 6.7048 (6.9496) Acc@1 0.781 (0.207) Acc@5 3.125 (0.803)
Epoch: [0][190/10010] Time 3.789 (3.481) Data 3.700 (3.335) Loss 6.8398 (6.9441) Acc@1 0.000 (0.209) Acc@5 0.000 (0.826)
Epoch: [0][200/10010] Time 11.521 (3.492) Data 11.433 (3.347) Loss 6.8196 (6.9367) Acc@1 0.000 (0.218) Acc@5 0.000 (0.875)
Epoch: [0][210/10010] Time 1.611 (3.465) Data 1.523 (3.321) Loss 6.7499 (6.9297) Acc@1 2.344 (0.233) Acc@5 2.344 (0.896)
Epoch: [0][220/10010] Time 11.472 (3.480) Data 11.383 (3.337) Loss 6.7838 (6.9230) Acc@1 0.781 (0.255) Acc@5 1.562 (0.937)
Epoch: [0][230/10010] Time 0.212 (3.443) Data 0.000 (3.299) Loss 6.8092 (6.9169) Acc@1 0.000 (0.257) Acc@5 0.781 (0.944)
Epoch: [0][240/10010] Time 10.698 (3.472) Data 10.610 (3.328) Loss 6.8725 (6.9105) Acc@1 0.000 (0.253) Acc@5 0.000 (0.969)
Epoch: [0][250/10010] Time 0.217 (3.451) Data 0.000 (3.307) Loss 6.8506 (6.9055) Acc@1 0.000 (0.246) Acc@5 0.000 (0.980)
Epoch: [0][260/10010] Time 9.317 (3.456) Data 9.229 (3.312) Loss 6.7118 (6.9010) Acc@1 0.000 (0.263) Acc@5 1.562 (0.988)
Epoch: [0][270/10010] Time 0.212 (3.439) Data 0.000 (3.295) Loss 6.7731 (6.8963) Acc@1 0.781 (0.277) Acc@5 1.562 (1.038)
Epoch: [0][280/10010] Time 11.279 (3.458) Data 11.191 (3.314) Loss 6.8488 (6.8909) Acc@1 0.000 (0.286) Acc@5 0.781 (1.054)
Epoch: [0][290/10010] Time 0.214 (3.436) Data 0.000 (3.292) Loss 6.7565 (6.8860) Acc@1 0.000 (0.290) Acc@5 0.781 (1.079)
Epoch: [0][300/10010] Time 12.405 (3.458) Data 12.317 (3.313) Loss 6.7233 (6.8805) Acc@1 0.000 (0.298) Acc@5 1.562 (1.121)
Epoch: [0][310/10010] Time 0.213 (3.426) Data 0.000 (3.282) Loss 6.7484 (6.8755) Acc@1 0.000 (0.306) Acc@5 2.344 (1.156)
Epoch: [0][320/10010] Time 13.653 (3.442) Data 13.559 (3.298) Loss 6.7439 (6.8712) Acc@1 0.000 (0.309) Acc@5 1.562 (1.173)
Epoch: [0][330/10010] Time 0.212 (3.418) Data 0.000 (3.273) Loss 6.7267 (6.8670) Acc@1 0.781 (0.314) Acc@5 2.344 (1.204)
Epoch: [0][340/10010] Time 13.209 (3.435) Data 13.121 (3.289) Loss 6.7553 (6.8636) Acc@1 1.562 (0.314) Acc@5 1.562 (1.210)
Epoch: [0][350/10010] Time 0.218 (3.412) Data 0.000 (3.265) Loss 6.6885 (6.8588) Acc@1 0.000 (0.318) Acc@5 3.125 (1.249)
Epoch: [0][360/10010] Time 12.825 (3.427) Data 12.731 (3.280) Loss 6.6241 (6.8540) Acc@1 0.000 (0.316) Acc@5 2.344 (1.260)
Epoch: [0][370/10010] Time 0.213 (3.408) Data 0.000 (3.260) Loss 6.8046 (6.8504) Acc@1 0.000 (0.322) Acc@5 1.562 (1.289)
Epoch: [0][380/10010] Time 11.702 (3.415) Data 11.615 (3.267) Loss 6.7234 (6.8459) Acc@1 1.562 (0.334) Acc@5 2.344 (1.312)
Epoch: [0][390/10010] Time 0.218 (3.399) Data 0.000 (3.249) Loss 6.7012 (6.8410) Acc@1 0.000 (0.342) Acc@5 2.344 (1.343)
Epoch: [0][400/10010] Time 12.231 (3.413) Data 12.144 (3.264) Loss 6.7159 (6.8370) Acc@1 0.000 (0.343) Acc@5 1.562 (1.356)
Epoch: [0][410/10010] Time 0.213 (3.396) Data 0.000 (3.245) Loss 6.5088 (6.8320) Acc@1 0.000 (0.348) Acc@5 3.125 (1.382)
Epoch: [0][420/10010] Time 12.972 (3.407) Data 12.883 (3.256) Loss 6.6504 (6.8275) Acc@1 0.781 (0.349) Acc@5 4.688 (1.403)
Epoch: [0][430/10010] Time 0.212 (3.393) Data 0.000 (3.242) Loss 6.6490 (6.8246) Acc@1 0.000 (0.352) Acc@5 3.906 (1.434)
Epoch: [0][440/10010] Time 11.984 (3.406) Data 11.896 (3.255) Loss 6.7207 (6.8209) Acc@1 0.781 (0.358) Acc@5 1.562 (1.465)
Epoch: [0][450/10010] Time 0.212 (3.387) Data 0.000 (3.235) Loss 6.5495 (6.8161) Acc@1 0.000 (0.357) Acc@5 0.000 (1.483)
Epoch: [0][460/10010] Time 11.841 (3.396) Data 11.748 (3.244) Loss 6.6327 (6.8123) Acc@1 0.781 (0.364) Acc@5 4.688 (1.527)
Epoch: [0][470/10010] Time 0.212 (3.383) Data 0.000 (3.231) Loss 6.5489 (6.8081) Acc@1 0.781 (0.370) Acc@5 7.031 (1.558)
Epoch: [0][480/10010] Time 8.418 (3.389) Data 8.331 (3.237) Loss 6.6245 (6.8034) Acc@1 0.781 (0.377) Acc@5 1.562 (1.569)
Epoch: [0][490/10010] Time 0.211 (3.388) Data 0.000 (3.237) Loss 6.6849 (6.7994) Acc@1 1.562 (0.380) Acc@5 2.344 (1.593)
Epoch: [0][500/10010] Time 6.984 (3.388) Data 6.890 (3.237) Loss 6.4890 (6.7949) Acc@1 0.781 (0.379) Acc@5 3.906 (1.616)
Epoch: [0][510/10010] Time 0.212 (3.391) Data 0.000 (3.239) Loss 6.6416 (6.7910) Acc@1 0.781 (0.382) Acc@5 2.344 (1.642)
Epoch: [0][520/10010] Time 2.660 (3.382) Data 2.572 (3.231) Loss 6.5715 (6.7870) Acc@1 0.781 (0.385) Acc@5 1.562 (1.660)
Epoch: [0][530/10010] Time 0.212 (3.388) Data 0.000 (3.236) Loss 6.5645 (6.7825) Acc@1 0.781 (0.393) Acc@5 2.344 (1.680)
Epoch: [0][540/10010] Time 1.908 (3.379) Data 1.820 (3.228) Loss 6.4077 (6.7779) Acc@1 2.344 (0.394) Acc@5 3.906 (1.692)
Epoch: [0][550/10010] Time 0.213 (3.381) Data 0.000 (3.230) Loss 6.5599 (6.7736) Acc@1 0.000 (0.397) Acc@5 0.781 (1.704)
Epoch: [0][560/10010] Time 0.856 (3.369) Data 0.768 (3.218) Loss 6.6386 (6.7695) Acc@1 0.781 (0.401) Acc@5 1.562 (1.732)
Epoch: [0][570/10010] Time 0.229 (3.377) Data 0.000 (3.226) Loss 6.5827 (6.7652) Acc@1 0.781 (0.409) Acc@5 3.125 (1.760)
Epoch: [0][580/10010] Time 0.975 (3.364) Data 0.887 (3.213) Loss 6.4518 (6.7610) Acc@1 0.781 (0.413) Acc@5 5.469 (1.779)
Epoch: [0][590/10010] Time 0.212 (3.370) Data 0.000 (3.219) Loss 6.5656 (6.7565) Acc@1 0.000 (0.428) Acc@5 2.344 (1.823)
Epoch: [0][600/10010] Time 0.212 (3.355) Data 0.046 (3.203) Loss 6.4239 (6.7520) Acc@1 0.781 (0.437) Acc@5 3.125 (1.851)
Epoch: [0][610/10010] Time 0.211 (3.363) Data 0.000 (3.212) Loss 6.3226 (6.7474) Acc@1 1.562 (0.445) Acc@5 6.250 (1.880)
Epoch: [0][620/10010] Time 0.214 (3.350) Data 0.000 (3.198) Loss 6.5112 (6.7432) Acc@1 1.562 (0.452) Acc@5 5.469 (1.906)
Epoch: [0][630/10010] Time 0.226 (3.354) Data 0.000 (3.201) Loss 6.4474 (6.7382) Acc@1 0.781 (0.458) Acc@5 3.125 (1.946)
Epoch: [0][640/10010] Time 0.211 (3.341) Data 0.000 (3.188) Loss 6.5718 (6.7347) Acc@1 0.781 (0.463) Acc@5 3.906 (1.967)
Epoch: [0][650/10010] Time 0.214 (3.347) Data 0.000 (3.194) Loss 6.5053 (6.7297) Acc@1 0.781 (0.472) Acc@5 1.562 (2.008)
Epoch: [0][660/10010] Time 0.212 (3.343) Data 0.000 (3.189) Loss 6.3718 (6.7246) Acc@1 0.781 (0.482) Acc@5 3.906 (2.044)
Epoch: [0][670/10010] Time 0.223 (3.366) Data 0.000 (3.212) Loss 6.3855 (6.7196) Acc@1 0.781 (0.496) Acc@5 3.906 (2.095)
Epoch: [0][680/10010] Time 0.212 (3.358) Data 0.000 (3.204) Loss 6.5520 (6.7149) Acc@1 0.781 (0.507) Acc@5 3.906 (2.129)
Epoch: [0][690/10010] Time 0.212 (3.370) Data 0.000 (3.216) Loss 6.3960 (6.7098) Acc@1 2.344 (0.510) Acc@5 7.031 (2.156)
Epoch: [0][700/10010] Time 0.214 (3.360) Data 0.000 (3.205) Loss 6.4797 (6.7055) Acc@1 0.781 (0.519) Acc@5 2.344 (2.190)
Epoch: [0][710/10010] Time 0.227 (3.368) Data 0.000 (3.212) Loss 6.3497 (6.7008) Acc@1 3.125 (0.531) Acc@5 4.688 (2.217)
Epoch: [0][720/10010] Time 0.213 (3.358) Data 0.000 (3.203) Loss 6.3555 (6.6961) Acc@1 2.344 (0.543) Acc@5 6.250 (2.256)
Epoch: [0][730/10010] Time 0.207 (3.376) Data 0.000 (3.220) Loss 6.5028 (6.6923) Acc@1 0.000 (0.544) Acc@5 2.344 (2.267)
Epoch: [0][740/10010] Time 0.210 (3.365) Data 0.000 (3.209) Loss 6.2173 (6.6880) Acc@1 2.344 (0.557) Acc@5 5.469 (2.313)
Epoch: [0][750/10010] Time 0.209 (3.372) Data 0.000 (3.215) Loss 6.5205 (6.6841) Acc@1 0.000 (0.564) Acc@5 2.344 (2.335)
Epoch: [0][760/10010] Time 0.209 (3.359) Data 0.000 (3.202) Loss 6.2149 (6.6788) Acc@1 1.562 (0.571) Acc@5 6.250 (2.367)
Epoch: [0][770/10010] Time 0.209 (3.364) Data 0.000 (3.207) Loss 6.4612 (6.6749) Acc@1 1.562 (0.586) Acc@5 3.906 (2.403)
Epoch: [0][780/10010] Time 0.208 (3.353) Data 0.000 (3.196) Loss 6.3526 (6.6705) Acc@1 0.000 (0.598) Acc@5 3.906 (2.439)
Epoch: [0][790/10010] Time 0.210 (3.359) Data 0.000 (3.202) Loss 6.2106 (6.6650) Acc@1 0.781 (0.607) Acc@5 3.906 (2.469)
Epoch: [0][800/10010] Time 0.209 (3.353) Data 0.000 (3.195) Loss 6.1517 (6.6601) Acc@1 3.906 (0.615) Acc@5 8.594 (2.503)
after I saw this issue, I tried installing pytorch-0.4.1 (python3.6, cuda 9.0)to rerun the code, however I met the following error message.
main.py:87: UserWarning: You have chosen a specific GPU. This will completely disable data parallelism.
warnings.warn('You have chosen a specific GPU. This will completely '
=> creating model 'resnet18'
Traceback (most recent call last):
File "main.py", line 344, in <module>
main()
File "main.py", line 152, in main
normalize,
File "/home/sirius/document/siriusShare/Clustering-Face/arcface-pytorch-master/code/Efficient-PyTorch-master/tools/folder2lmdb.py", line 31, in __init__
self.length =pa.deserialize(txn.get(b'__len__'))
File "pyarrow/serialization.pxi", line 458, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 420, in pyarrow.lib.deserialize_from
File "pyarrow/serialization.pxi", line 397, in pyarrow.lib.read_serialized
File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Cannot read a negative number of bytes from BufferReader.
Hi, can you tell us which python, cuda, pytorch, pyarrow version you were using? Thanks very much for your help (I've been spend weeks for solving this problem, I tried hdf5 and DALI before, but they did not solve the the problem. Since the official ImageNet classification training also has GPU utilization of 100%,0%,100%,0%,100%,0%...)
使用上述代码生成lmdb并用DetectionLMDB作为dataset,速度很慢,不知道为啥,是不是必须跟DDP混合使用呢?