donggong1 / memae-anomaly-detection

MemAE for anomaly detection. -- Gong, Dong, et al. "Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection". ICCV 2019.
https://donggong1.github.io/anomdec-memae.html
MIT License
465 stars 103 forks source link

about network of video anomaly detection #2

Open wanboyang opened 5 years ago

wanboyang commented 5 years ago

I have read your excellent paper. I have some question about the content “Accordingly, the input of the network is a cuboid constructed by stacking 16 neighbor frames in grayscale. The structures of encoder and decoder are designed as: Conv3(3, 2, 96)- Conv3(3, 2, 128)-Conv3(3, 2, 256)-Conv3(3, 2, 256) and Dconv3(3, 2, 256)-Dconv3(3, 2, 256)-Dconv3(3, 2, 128)- Dconv3(3, 2, 1), where Conv3 and Dconv3 denote 3D convolution and deconvolution, respectively. A BN and a ReLU activation follow each layer (except the last one)” in section4.2.

  1. I would like to ask if the pool layer is used in the above structure. If used, what is the exact structure of the pool layer? 2.What is the size of the input?
wanboyang commented 5 years ago

when I use the hyper_parameters of network in your paper, the network product feature like:

    Layer (type)               Output Shape         Param #

================================================================ Conv3d-1 [8, 96, 7, 119, 159] 2,592 BatchNorm3d-2 [8, 96, 7, 119, 159] 192 ReLU-3 [8, 96, 7, 119, 159] 0 Conv3d-4 [8, 128, 3, 59, 79] 331,776 BatchNorm3d-5 [8, 128, 3, 59, 79] 256 ReLU-6 [8, 128, 3, 59, 79] 0 Conv3d-7 [8, 256, 1, 29, 39] 884,736 BatchNorm3d-8 [8, 256, 1, 29, 39] 512 ReLU-9 [8, 256, 1, 29, 39] 0 ConvTranspose3d-10 [8, 256, 3, 59, 79] 1,769,472 BatchNorm3d-11 [8, 256, 3, 59, 79] 512 ReLU-12 [8, 256, 3, 59, 79] 0 ConvTranspose3d-13 [8, 128, 7, 119, 159] 884,736 BatchNorm3d-14 [8, 128, 7, 119, 159] 256 ReLU-15 [8, 128, 7, 119, 159] 0 ConvTranspose3d-16 [8, 1, 15, 239, 319] 3,456 BatchNorm3d-17 [8, 1, 15, 239, 319] 2

the loss of network can not calculate because mismatch of input and output

zhouwei342622 commented 5 years ago

hello,Mr GongDong,Is there open source code for this article?

Wolfybox commented 4 years ago

when I use the hyper_parameters of network in your paper, the network product feature like:

    Layer (type)               Output Shape         Param #

================================================================

Conv3d-1 [8, 96, 7, 119, 159] 2,592 BatchNorm3d-2 [8, 96, 7, 119, 159] 192 ReLU-3 [8, 96, 7, 119, 159] 0 Conv3d-4 [8, 128, 3, 59, 79] 331,776 BatchNorm3d-5 [8, 128, 3, 59, 79] 256 ReLU-6 [8, 128, 3, 59, 79] 0 Conv3d-7 [8, 256, 1, 29, 39] 884,736 BatchNorm3d-8 [8, 256, 1, 29, 39] 512 ReLU-9 [8, 256, 1, 29, 39] 0 ConvTranspose3d-10 [8, 256, 3, 59, 79] 1,769,472 BatchNorm3d-11 [8, 256, 3, 59, 79] 512 ReLU-12 [8, 256, 3, 59, 79] 0 ConvTranspose3d-13 [8, 128, 7, 119, 159] 884,736 BatchNorm3d-14 [8, 128, 7, 119, 159] 256 ReLU-15 [8, 128, 7, 119, 159] 0 ConvTranspose3d-16 [8, 1, 15, 239, 319] 3,456 BatchNorm3d-17 [8, 1, 15, 239, 319] 2 the loss of network can not calculate because mismatch of input and output

I got the same issue when testing: 'operands could not be broadcast together with shapes (16,1,240,368) (16,1,240,360) ' It seems that after the encode-decode operations the width was wrongly extended to 368. I troubleshoot this and find it is actually a typical Conv Operation Mistake --- Since the width of input image 360 is not fully divisble by 16 (i.e. four times of Conv Op with stride 2) So after that the width will be 22.5 which will later be rounded up to 23 in Pytorch. Then after decoding process which deconv four times on the latent features map, the final height will be 23 X 16 = 368 as it turns out to be.

Wolfybox commented 4 years ago

The code is incomplete and not bug-free. Sad.