YOLO-DoA is an efficient YOLOv3-based approach for DoA estimation, which is implemented as a regression task to spatially separated angular boxes. DoAs of sources with confidence scores are directly predicted from the spectrum proxy with YOLO-DoA and an end-to-end estimation is realized. Simulation results demonstrate that the proposed approach outperforms several state-of-the-art methods in terms of network size, computational cost,prediction time and accuracy of DoA estimation.
TABLE I: The effectiveness study of YOLO-DoA. MPS represents the mini-batch per second.
Methods | Parameters | GFLOPs | MPS | RMSE | |
---|---|---|---|---|---|
A | YOLO-Basic | 22.391M | 81.041 | 1.74 | 1.9°,6.3° |
B | YOLO-ResNet18 | 5.496M | 18.649 | 3.61 | 1.9°,7.2° |
C | YOLO-ResNet18+ | 0.162M | 0.721 | 8.22 | 2.3°,7.6° |
D | + CSP Connection | 0.080M | 0.332 | 8.53 | 2.2°,7.5° |
E | + GIoU Loss | 0.080M | 0.332 | 8.23 | 1.5°,6.5° |
F | + SE Operation | 0.081M | 0.333 | 8.08 | 1.4°,6.2° |
G | + Grid Sensitive | 0.081M | 0.333 | 8.11 | 1.6°,6.5° |
H | + SPP Layer | 0.108M | 0.397 | 8.04 | 1.5°,6.4° |
Through steps A → F, the construction of YOLO-DoA is completed. Compared to YOLO-Basic, both the parameters and computational cost of YOLO-DoA are reduced by 99.6%. Meanwhile, the prediction speed is increased by a factor of 4.6 and RMSE is decreased obviously. Therefore, the effectiveness of YOLO-DoA is confirmed. Moreover, the Grid Sensitive and Spatial Pyramid Pooling(SPP) layer are additionally tested in the experiment. The results show that these two modules will deteriorate performance of DoA estimation, hence they are not adopted in YOLO-DoA.
Fig. 1: The block diagram of the YOLO-DoA and the implementation details of SR, CSR, CBR, CBL, CBF, UPS and Head module.
TABLE 2:Implementation details of each module in YOLO-DOA. For simplicity, CSR modules from left to right in BackBone are recorded as CSR1 to CSR4, CBF modules in Neck are recorded as CBF1 to CBF3 from top to bottom, and UPS modules in Neck are recorded as UPS1 to UPS2 from top to bottom.
Type | Notation | Input | Output |
---|---|---|---|
CRB | 7×1,8 | 192×1×1 | 192×1×8 |
Maxpooling | 3×1,8 | 192×1×8 | 96×1×8 |
CSR1 | (in,out)=(8,16) | 96×1×8 | 48×1×16 |
CSR2 | (in,out)=(16,32) | 48×1×16 | 24×1×32 |
CSR3 | (in,out)=(32,64) | 24×1×32 | 12×1×64 |
CSR4 | (in,out)=(64,64) | 12×1×64 | 6×1×64 |
CBF1 | in=32 | 6×1×64 | 6×1×32 |
UPS1 | 6×1×32 | 12×1×16 | |
CBF2 | in=32 | 12×1×48 | 12×1×32 |
UPS2 | 12×1×32 | 24×1×16 | |
CBF3 | in=16 | 24×1×32 | 24×1×16 |
Head | 24×1×16 | 24×1×15 |
In the following equation, the loss function is composed of confidence loss and regression loss. The weighted cross-entropy function is adopted as confidence loss. The regression loss of YOLOv3 is replaced with the generalized intersection over union (GIoU) function between $box{i, j}$ and the predicted boxes $\hat{box}{i, j}$. The loss function is expressed as
$$\begin{equation} \begin{aligned} & \text{Loss}(\boldsymbol{\bf{\chi}},\hat{\boldsymbol{\bf{\chi}}})= -\sum\limits{i=1}^{S}{\sum\limits{j=1}^{P}{{\text{(1-$\boldsymbol{\bf{\hat{\chi}}}^{i,j,5})^{\gamma}$}}\text{$\boldsymbol{\bf{\chi}}^{i,j,5}$} \text{log} \zeta{(\text{$\boldsymbol{\bf{\hat{\chi}}}^{i,j,5}$})}}} \ & -\sum\limits{i=1}^{S}{\sum\limits{j=1}^{P}{\vartheta {i,j}^{\text{nobj}}{\text{(1-$\boldsymbol{\bf{\hat{\chi}}}^{i,j,5})^{\gamma}$}} \text{log} (1-\text{$\zeta{(\boldsymbol{\bf{\hat{\chi}}}^{i,j,5}}$}))}} \ & +{{\lambda}{\text{coord}}}\sum\limits{i=1}^{S}{\sum\limits{j=1}^{P}{\boldsymbol{\bf{\chi}}^{i,j,5} (1-\text{GIoU($\boldsymbol{\bf{\chi}}^{i,j,1:4}$, $\hat{\boldsymbol{\bf{\chi}}}^{i,j,1:4}$)}})} \ \end{aligned} \end{equation}$$
where $S$ and $P$ are the total number of SubRegs and MicroRegs, respectively. The first two terms are the confidence loss, and the latter is the regression loss. $\gamma$ is the weighted factor.
If GIoU ratios between the predicted box $b\hat{o}{{x}{i,j}}$ and all of the bounding boxes are less than threshold value, the $\vartheta {i,j}^{\text{nobj}}$ is 1,Otherwise $\vartheta _{i,j}^{\text{nobj}}$ is 0.
${\lambda}_{\text{coord}}$ is the penalty factor of regression loss.
The width of bounding box W can affect the accuracy of DoA estimation. When W is too large, more irrelevant features are imposed during the training, which hinders the learning of angular features. On the contrary, multiple predicted boxes with approximated confidence scores are generated for the same incident direction, which reduce the effectiveness of soft-NMS. Hence, we evaluate the RMSE of YOLO-DoA with respect to W given SNR = 9 dB and P = 3. As shown in results of the subplot (a) of the following figure, we can see that the optimal value of W is 2°.
Moreover, the number of MicroRegs P can also affect the accuracy of DoA estimation. We further evaluate the RMSE versus P given SNR = 9 dB and W = 2°, the results of which are shown in the subplot (b) of the following figure. We can see that P = 3 is optimal.That’s because when too few MicroRegs are utilized, the sources with small spatial spacing cannot be separated rightly. Instead, too many MicroRegs will produce more redundant predicted boxes, which also reduces the effectiveness of soft-NMS.
【2022/01/19】We upload the source code of YOLO-DoA model
【2022/01/20】We upload test files and prediction code
【2022/08/20】We update some details about files
【2023/01/15】We update some details about loss function, network structure and added experiment
python 3.8.6
PyCharm Community 2018.3.2
CUDA 10.0
NVIDIA GeForce RTX2080
Two Intel Xeon E5-2678v3 @2.50GHz CPUs and 128GB RAM
h5py 2.10.0
numpy 1.19.3
pandas 0.25.0
tensorflow-gpu 1.13.1
Issues should be raised directly in the repository. For professional support requests please email Rong Fan at fanrong@cafuc.edu.cn.