NVIDIA-AI-IOT / face-mask-detection

Face Mask Detection using NVIDIA Transfer Learning Toolkit (TLT) and DeepStream for COVID-19
MIT License
241 stars 94 forks source link

Pretrained model #12

Closed fiv21 closed 3 years ago

fiv21 commented 3 years ago

Hi! I would like to ask if you can share the pre-trained model for Jetson with the calibration file. I mean if you can share a .zip with the model trained by you and I can prune in my side. The main problem I have is not achieving the same values in mAP even if I retrain. Thanks in advance! All the best,

Franco.

ak-nv commented 3 years ago

We cannot provide pre-trained model for face-mask-detection. What is mAP you are achieving currently? What are the errors? Based on you batch size and hyper-parameters, your mAP might drop.

fiv21 commented 3 years ago

Okay, I'll try with a bigger batch size because I trained with a GTX1070 and batch size 8. Probably that's the problem with mAP in no-masked. By the way the error with a bigger batch size was an allocation memory error. So I think if you can't provide the trained model maybe you can help me with the optimal parameters for an i7 cpu + 16 gb ram + gtx1070. Another way I think it's possible is using Google Collab or allocating a vm in azure or aws to use a Tesla V100. Any suggestions or help is appreciated. Thanks in advance for your help.

On Thu, Sep 17, 2020, 1:25 PM Amey Kulkarni notifications@github.com wrote:

We cannot provide pre-trained model for face-mask-detection. What is mAP you are achieving currently? What are the errors? Based on you batch size and hyper-parameters, your mAP might drop.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA-AI-IOT/face-mask-detection/issues/12#issuecomment-694345971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPNLORJ4QZDL7ZVONO73YTSGIZ7VANCNFSM4RQSKYFQ .

ak-nv commented 3 years ago

By the way the error with a bigger batch size was an allocation memory error.

With 24 Batch Size, you will need about 15 GB memory, whereas once you prune model (as in git ~12% prune ratio) you can be okay with 8GB memory.

I have personally tried AWS and used Tesla V100 for this task and it worked out pretty well. I have not tried any other Cloud instances yet so cannot comment on it.

fiv21 commented 3 years ago

Okay, but I didn't get a point. When you say about prune the model I think you're talking about infer. But my problem is during the training or I misunderstand you sorry for that. Again, if I can train with my own GPU to move then to my AGX Xavier it will be great. Just to put some context, I'm trying to learn how to build the workflow pipeline with my personal computer and the Xavier before moving to cloud. Following the idea of this repo: If I can train the model in my GTX1070 and move the model to infer in the Jetson then mission completed by now. After that I can think in a way to deploy via docker or something like that. By the other hand, if my GPU memory is too low for this experiment, I think it can be updated as a requirement for the git to prevent future problems with someone else. As a solution for this, I would like to ask if you have any idea (theorical) about if it's possible to make this process using google collab, after all there are GPUs enabled runtimes with Teslas T4 that can be very usefull for my purpouse.

ak-nv commented 3 years ago

I feel it should be doable even with GTX1070, you need to experiment with other hyper-params as well when you reduce batch size, in detectnet_v2_train_resnet18_kitti.txt look at model_config . I had better luck with even batch size 16 previously. As batch size reduces your training time will also increase.

When you say about prune the model I think you're talking about infer.

In TLT, we do training first and once we achieve satisfactory accuracy, we prune model and retrain the pruned model. See 3 to 8 steps on TLT workflow

fiv21 commented 3 years ago

Oh I got it now! I'll play a little with that this weekend and update this if I succeed. Thanks for your help!

On Thu, Sep 17, 2020 at 2:38 PM Amey Kulkarni notifications@github.com wrote:

I feel it should be doable even with GTX1070, you need to experiment with other hyper-params as well when you reduce batch size, in detectnet_v2_train_resnet18_kitti.txt look at model_config https://github.com/NVIDIA-AI-IOT/face-mask-detection/blob/master/tlt_specs/detectnet_v2_train_resnet18_kitti.txt#L73 . I had better luck with even batch size 16 previously. As batch size reduces your training time will also increase.

When you say about prune the model I think you're talking about infer.

In TLT, we do training first and once we achieve satisfactory accuracy, we prune model and retrain the pruned model. See 3 to 8 steps on TLT workflow https://github.com/NVIDIA-AI-IOT/face-mask-detection#nvidia-transfer-learning-toolkit-tlt-training-flow-

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA-AI-IOT/face-mask-detection/issues/12#issuecomment-694390374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPNLOTTSCXSTA25RMKWMCDSGJCRVANCNFSM4RQSKYFQ .

fiv21 commented 3 years ago

Okay, working around with the GTX1070 I didn't find a good result in any possible way, I don't know exactly why. My suspicious points to my dataset and probably the batch size, but without luck to find a scientific answer. So, to avoid this waste of time with my computer, I used a Tesla V100 from a VM, and here comes the interesting part, I built again the dataset as you suggested in the repo, but now the unpruned model hits:

Validation cost: 0.001015
Mean average_precision (in %): 70.6842

class name      average precision (in %)
------------  --------------------------
mask                             81.1487
no-mask                          60.2197

Median Inference Time: 0.005730

after this I think: Okay, let's try retraining the pruned model with the default settings from the notebook and I get this:

Validation cost: 0.000977
Mean average_precision (in %): 65.6631

class name      average precision (in %)
------------  --------------------------
mask                             60.5674
no-mask                          70.7589

Median Inference Time: 0.003357

I think I've misunderstood something, so I decided to make some changes.

First, I changed the amount of the images for each class to 12000 and the results are quite better:

Validation cost: 0.000234
Mean average_precision (in %): 80.8005

class name      average precision (in %)
------------  --------------------------
mask                             84.494
no-mask                          77.1069

Median Inference Time: 0.005580

then I retrained with a tweak in the prune step, I changed this cell:

!tlt-prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned_12k/weights/resnet18_detector.tlt \
           -o $USER_EXPERIMENT_DIR/experiment_dir_pruned_12k_test2/resnet18_nopool_bn_detectnet_v2_pruned.tlt \
           -eq union \
           -pth 0.12 \
           -k $KEY

I changed the -pth from 0.8 to 0.12 and I get this result:

=========================

Validation cost: 0.000245
Mean average_precision (in %): 80.6542

class name      average precision (in %)
------------  --------------------------
mask                             84.6146
no-mask                          76.6938

Median Inference Time: 0.004232

At this point I don't know exactly what to change or where point the effort to get closer to your results, If you can help me with this I'll really appreciate your help. If it's needed I can share my TFRecords to reproduce the training. Thanks for your time!

NOTE: You should add a warning in the repo about the changes in the path in the config files for those who like me doesn't know much about TLT and stuff like that

EDIT: I miss clarify about the configurations. I used the default config files provided in this repo with the change in the point where the models and data are located

ak-nv commented 3 years ago

@fiv21 Thanks for working through this and recommendations. I am trying to add more detailed steps and examples with sample video soon in upcoming weeks, if that helps.

fiv21 commented 3 years ago

That's great! It could work to understand a little bit more about this. It's interesting, because, maybe it's a base example for other cases, perhaps to look if a person uses a security helmet or similar. I close the issue, waiting for new info in the repo to improve the results or the right way to achieve the same accuracy. Thanks in advance!