SpursLipu / YOLOv3v4-ModelCompression-MultidatasetTraining-Multibackbone

YOLO ModelCompression MultidatasetTraining
GNU General Public License v3.0
444 stars 136 forks source link

Out of Memory during pruning #87

Open ColinPeeris opened 3 years ago

ColinPeeris commented 3 years ago

Hi.

I'm trying to do pruning on a yolov4 model. I've done all the steps. However, I keep going out of memory. I'm using a T4 card with 15 GB of memory.

Am I missing something? Thanks.

ColinPeeris commented 3 years ago

Hi,

For the training code, I write in the argument using "--device 0,1" to use both my T4 cards. Both devices can be read by the script. However, I get an OOM error again as the scrip only utilizes one GPU.

I get this as output: image

At the point of crash, only one GPU was filled: image

I believe both GPUs can be detected by the script. But only one get's used to load the model. Am I missing something?

Thanks.

chumingqian commented 3 years ago

Hi, @ColinPeeris : There are two places, we may change when we runing XXX_prune.py : 1 . The batch_size at def test() in test.py. When we runing XXX_prune.py, eval_model will call the test function in test.py; take an example in shortcut_prune.py , Line 162: eval_model call the test function, so we may change batch_size at def test(), the default batch_size was 16 in the definition , we can change it from 16 --> 8; 2 . Take a shortcut_prune.py as an example , at Line 221 , random_input = torch.rand((16, 3, 416, 416)).to(device), we may also can change 16--> 8;

ColinPeeris commented 3 years ago

Hi @chumingqian,

Thanks so much for your reply.

Just 2 more questions:

1) To do pruning, must I do all 3 steps prescribed in the link (i.e. Training, Sparse Training, Pruning) even if I already have a model in mind? I want to prune yolov4. Can I just skip ahead to the pruning step 3?

2) One of the issues I have is at the train.py. I use the flag to set both devices "--device 0,1" since I have 2 T4 cards. However, only 1 GPU is used (see image below). I have since reduced the batch size using the flag "--batch-size 8" and it can run. I'm just wondering why the flag to set both devices isn't working. My full command is this: python train.py --data "data\coco2017.data" -pt --batch-size 8 --weights "weights/yolov3/yolov3.weights" --cfg "cfg/yolov3/yolov3.cfg" --device 0,1

image

chumingqian commented 3 years ago

Hi @ColinPeeris ,

  1. It is recommended to do the Sparse training Step, the purpose of Sparse training is to sparsing the coefficients of BN layers which gamma coefficients , letting the gamma coefficient close to the zero. Via the Sparse Training , we can decrease the reduction of map@0.5 due to the pruning .
    For a instance , if our permission is detect 20 classes on dataset :for this case , when we pruning without sparse training, it will bring a lot accuracy drop . If we sparse training before we prune, it will also bring a accuracy drop but not too much. But if our permission is detect one classes on dataset . For this case , doing sparse training or without sparse training before the pruning . the difference of accuracy drop may not too much.

     In  a short words,  when  your  task  and dataset   is  complex and large,   i  would  recommended  you  do the Sparse  training.

gamma-coefficient

  1. I will check it later.
chumingqian commented 3 years ago

Hi , @ColinPeeris : 2 . Could you just try with the following commond: python train.py --data "data\coco2017.data" -pt --batch-size 8 --weights "weights/yolov3/yolov3.weights" --cfg "cfg/yolov3/yolov3.cfg" --device "0,1" Adding the double quotation marks if that works let me know;

ColinPeeris commented 3 years ago

Hi @chumingqian,

I tried using double quotations as well but it still doesn't work. It looks to me that the code does read the 0 and 1 correctly. Hence the "Using CUDA device..." message highlighted in the image below. It's just that it cannot distribute the load. Perhaps at a later step with the "DistributedDataParallel" function (doesn't work when changed to "DataParallel" either).

image

If you're able to get the work distributed seamlessly between the 2 GPUs, perhaps it's a pytorch version? I'm using torch 1.4.0 and torchvision 0.5.0.

chumingqian commented 3 years ago

Hi @ColinPeeris , I am using one single GPU and mine ENV: Ubuntu 18.04; Pytorch 1.8.0; It can works fine;

chumingqian commented 3 years ago

Hi @ColinPeeris , Before you install other pytorch version, could you get the following information for me; python import torch; torch.cuda.is_available() torch.cuda.device_count() torch.cuda.current_device()

ColinPeeris commented 3 years ago

Hi @chumingqian ,

Here's the information you requested.

image

chumingqian commented 3 years ago

Hi @ColinPeeris , On ubuntu, i can set the " CUDA_VISIBLE_DEVICES=0,1 " by adding the "export CUDA_VISIBLE_DEVICES=0,1" via the command line sudo gedit ~/.bashrc, so i can add the content " export CUDA_VISIBLE_DEVICES=0,1 " to the bashrc file , then source ~/.bashrc ;
On windows: can you try add the content '' CUDA_VISIBLE_DEVICES=0,1 " (not include double quotations ) to the pc 's environment path . Then save restart pc and try again.

ly0303521 commented 3 years ago

when I train yolov4, the map changed to 0, have any ideas ? python train.py --data missions/trash_lid_v4/trash_lid_v4.data \ --batch-size 3 \ -pt --weights pretrained/yolov4.weights \ --cfg missions/trash_lid_v4/yolov4-custom_trash_lid.cfg \ -sr --s 0.0005 \ --wdir weights_trash_lid_v4_p0 \ --prune 0 \ --device 3 \ --img-size 608 \ --adam ![Uploading Screenshot from 2020-11-18 13-30-25.png…]()

chumingqian commented 3 years ago

你好, Ly: 你可以试着先用 SGD ,即默认的 不用设置 ; 另外 -- weight 是给出权重路径, --wdir ?? 你这个参数哪里来的?

ly0303521 commented 3 years ago

你好, Ly: 你可以试着先用 SGD ,即默认的 不用设置 ; 另外 -- weight 是给出权重路径, --wdir ?? 你这个参数哪里来的?

你好,--wdir 是我自己加的,指定权重保存路径,SGD试过了,一样会变成0,我现在把稀疏的参数调小一点,设置成--s 0.0005,带回再看看