JDAI-CV / fast-reid

SOTA Re-identification Methods and Toolbox
Apache License 2.0
3.39k stars 830 forks source link

Support CUDA 11 and CUDA 10 with some Clean Up #687

Open KleinYuan opened 1 year ago

KleinYuan commented 1 year ago

A few issues occur:

  1. the docker cannot run on CUDA11, aka, all the Amphere arch GPUs, like 3070, 3080, ...
  2. the documented docker run has issues: /bin/sh exe will make pip not available

This PR is fully tested on a 3070 machine, we can run training:

[01/27 22:31:39 fastreid.utils.checkpoint]: No checkpoint found. Training model from scratch
[01/27 22:31:39 fastreid.engine.train_loop]: Starting training from epoch 0
[01/27 22:32:24 fastreid.utils.events]:  eta: 1:21:55  epoch/iter: 0/199  total_loss: 7.745  loss_cls: 6.461  loss_triplet: 1.292  time: 0.2043  data_time: 0.0013  lr: 6.60e-05  max_mem: 4862M
[01/27 22:32:24 fastreid.utils.events]:  eta: 1:21:55  epoch/iter: 0/201  total_loss: 7.726  loss_cls: 6.445  loss_triplet: 1.26  time: 0.2043  data_time: 0.0010  lr: 6.63e-05  max_mem: 4862M
[01/27 22:33:08 fastreid.utils.events]:  eta: 1:23:00  epoch/iter: 1/399  total_loss: 5.311  loss_cls: 4.884  loss_triplet: 0.4171  time: 0.2082  data_time: 0.0010  lr: 9.75e-05  max_mem: 4862M
[01/27 22:33:09 fastreid.utils.events]:  eta: 1:23:00  epoch/iter: 1/403  total_loss: 5.273  loss_cls: 4.852  loss_triplet: 0.4111  time: 0.2085  data_time: 0.0010  lr: 9.82e-05  max_mem: 4862M
[01/27 22:33:58 fastreid.utils.events]:  eta: 1:23:21  epoch/iter: 2/599  total_loss: 3.677  loss_cls: 3.44  loss_triplet: 0.227  time: 0.2194  data_time: 0.0007  lr: 1.29e-04  max_mem: 4862M

It includes the following changes:

  1. add a CUDA 11 docker file
  2. move the dockerfile to the root folder
  3. update the docker command documentation
  4. remove the user management -- not necessary