AdversarialPoseNet-2DMedical

Yu Chen's Adversarial-PoseNet for landmark localization in 2D medical images (lower extrimites)

Pytorch implementation of chen et al. "Adversarial PoseNet" for landmark localization on medical data. The method was proposed by Yu Chen, Chunhua Shen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang in Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation.

Goal

The goal of this work will be to investigate the role of adversarial learning for keypoint localization of 6 landmarks in lower extremities taken from dataset of 660 X-ray images by incorporating priors about the structure of the lower extremities pose components implicitly during the training of a network. For this analysis, an already established generative adversarial network architecture is being presented which predicts heatmaps. The effectiveness of the network is trained and tested on the X-ray medical images of lower extremities and evaluated in terms of localization accuracy (within 10 mm tolerance). Occlusions in medical images happens due to certain causes such as putting prosthetics on the bone joints or restricted view, making it harder for landmark localization. However, under this conditions, human vison can predict near accurate poses by exploiting geometric orientation of joint inter-connectivity between bones in the medical images.

Model Architecture:

1. generator architecture :

Network training :

Adversarial training :

Pose Discriminator training :

𝑙oss𝑝(𝐺, 𝑃) = 𝐸[𝑙𝑜𝑔 𝑃(𝑦, 𝑥)] + 𝐸[𝑙𝑜𝑔(1 − |𝑃(𝐺(𝑥) , 𝑥) − 𝑝_𝑓𝑎𝑘𝑒|)].

where p-real are ground-truth of the real heatmaps. All of them are labelled as 1. whereas P-fake is the label for the generated (fake) heatmaps, and the size of the pfake is [1x6], where the value of p-fake is either ’0’, ‘1’. 0 if predicted key-point is incorrectly localized, 1 if accurately localized.

maximize all the terms above

Confidnece Discriminator training :

𝑙𝑐(𝐺, 𝐶) = 𝐸[𝑙𝑜𝑔 𝐶(𝑦)] + 𝐸[𝑙𝑜𝑔(1 − |𝐶(𝐺(𝑥)) − 𝑐_𝑓𝑎𝑘𝑒|)]

Where c-fake is the ground truth confidence label for fake heatmaps. During training the confidence network, the real heatmaps are labelled with a 1 x 6 (6 is the number of body parts) unit vector c_real. The confidence of the fake (predicted) heatmap should be high when it is close to ground truth and low otherwise. The output range of values in c-fake is either 0 or 1. 0 if predicted key-point is incorrectly localized, 1 if accurately localized by the generator.

maximize all the terms above

Generator training (multi-tasking) :

Task 1:

MSE = Mean Sq error loss (yhat , y). yhat are predicted heatmaps , y are ground-truth heatmaps.

Task 2:

𝐿oss𝑝(𝐺, 𝑃) = 𝐸[𝑙𝑜𝑔𝑃(𝑦, 𝑥)] + 𝐸[𝑙𝑜𝑔(1 − |𝑃(𝐺(𝑥), 𝑥) − 𝑝_𝑓𝑎𝑘𝑒|)] , 𝑤ℎ𝑒𝑟𝑒 𝑦 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑔𝑟𝑜𝑢𝑛𝑑𝑡𝑟𝑢𝑡ℎ ℎ𝑒𝑎𝑡𝑚𝑎𝑝𝑠 , P is Pose distriminator. 𝐺(𝑥) , 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 ℎ𝑒𝑎𝑡𝑚𝑎𝑝𝑠, 𝑥 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑖𝑛𝑝𝑢𝑡 𝑖𝑚𝑎𝑔𝑒. Generator also tries to minimize the 2nd term 𝑙𝑜𝑔(1 − |𝐶(P(G(x),𝑥) ) − p_𝑓𝑎𝑘𝑒|)
𝐿oss𝑝(𝐺, C) = 𝐸[𝑙𝑜𝑔C(𝑦)] + 𝐸[𝑙𝑜𝑔(1 − |C(𝐺(𝑥)) − c_𝑓𝑎𝑘𝑒|)] , 𝑤ℎ𝑒𝑟𝑒 𝑦 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑔𝑟𝑜𝑢𝑛𝑑𝑡𝑟𝑢𝑡ℎ ℎ𝑒𝑎𝑡𝑚𝑎𝑝𝑠 , P is Pose distriminator. 𝐺(𝑥) , 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 ℎ𝑒𝑎𝑡𝑚𝑎𝑝𝑠, 𝑥 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑖𝑛𝑝𝑢𝑡 𝑖𝑚𝑎𝑔𝑒. Generator also tries to minimize the 2nd term 𝑙𝑜𝑔(1 − |𝐶(𝐺(𝑥) ) − 𝑐_𝑓𝑎𝑘𝑒|)
- Total Generator loss = MSE + beta𝐿oss𝑝(𝐺, 𝑃) + alpha𝐿oss𝑝(𝐺, C) alpha , beta are scaling terms.

Sample input images (left) & its corresponding ground truth heatmap(right):

Results Visualization

The results of this implementation: