Performance tweaking multi detect

BerendWijers commented 2 years ago

Is your feature request related to a problem? Please describe. According to the documentation we could, in previous versions of DeepSqueak, adjust the chunk length for analysis to tweak performance. This option seemingly has disappeared. I have a suspicion that DeepSqueak might not be set to use the full GPU. Therefore, I would like to be able to adjust these settings as only a small portion of my GPU is being used whereas I would aim to use the GPU to its fullest capacity. See additional information for performance results during runtime. I verified that Matlab is using the GPU by using nvidia-smi to determine which processes are running on the GPU.

Describe the solution you'd like Return the option to set performance settings such as analysis chunk length and other options if applicable.

Describe alternatives you've considered I've changed the function detect call arguments to force MEX code generation and force GPU environment, which provided no additional performance gain. [bboxes, scores, Class] = detect(network, im2uint8(im), 'ExecutionEnvironment','gpu','SelectStrongest',1,'Acceleration','mex'); % added explicit GPU and mex Accel.

Additional context The hardware / software I run DeepSqueak with:

Software:

Windows Server 2019
Matlab R2021a, Update 7
- MATLAB support for MinGW-w64 C/C++ compiler
- GPU Coder interface for deep learning libraries
- Statistics and Machine Learning Toolbox
- Signal Processing Toolbox
- Parallel Computing Toolbox
- Image Processing Toolbox
- Deep Learning Toolbox
- Curve Fitting Toolbox
- Computer Vision Toolbox
DeepSqueak v3.04
NVIDIA driver version 516.31

Hardware:

Virtual Machine
- CPU: Intel Xeon Gold 6342 2.80 GHz, 11 cores
- RAM: 88 GB
- GPU: NVIDIA A10 Tensor Core GPU

Add any other context or screenshots about the feature request here. Files https://filesender.surf.nl/?s=download&token=721de2d1-e231-4ee9-988f-7d372cc77f73

Monitoring (snapshot) of GPU usage during DeepSqueak run with audio file 1. deepsqueak_3 0 4_A10_gpu_usage

DrCoffey commented 2 years ago

Hey @BerendWijers, I removed the chunk length option because it was causing detection variability. The Yolo networks are trained using a set image size, so I now detect at that size. When you train a network you can use bigger images, and then the detection will use bigger images. The included networks all use 0.5s images, and so that is the chunk length. I was trying to optimize for detection quality, not speed.

BerendWijers commented 2 years ago

Hi @DrCoffey , Thank you for the answer! Very understandable choice to remove chunk length option then.

I will relay the information to the group I'm working with. Furthermore, I will try and determine if we still require a performance increase. If that is the case I will look more in the direction of the pipeline design rather than trying to (performance) optimize DeepSqueak itself as that is not possible.

DrCoffey / DeepSqueak

Performance tweaking multi detect #176