fjchange / object_centric_VAD

An Tensorflow Re-Implement of CVPR 2019 "Object-centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video"
MIT License
97 stars 30 forks source link

The results were not satisfactory #2

Open AndyHon opened 5 years ago

AndyHon commented 5 years ago

Hello, thanks for your help. I have finished the experiment. However, my result on avenue is only 0.448 auc, which is far from the experimental result. May I ask what is the result of your test? And how can I improve the results?

fjchange commented 5 years ago

The num of object extracted from Avenue is about 5500, which makes the result ugly, while the author comfirmed that it is exactly the num they got.

I don't know what params you used, from my experiments, no bn, no normalization, no class_add get better results. However, none of them perform as good as the author did. Mine is below:

Ped2 | 86.51% Avenue | 64.11% ShanghaiTech | 80.35%

Maybe the last part, SVM, is where the problem lies. You can use cffi to link dynamic library of vlfeat, maybe better. Considering the framework includes 3 stages, it may works better with unit testing. Sorry that I have no time to better it. If you are interested, you can imporve it yourself.

AndyHon commented 5 years ago

I really appreciate your guidance. I really admire your successful implementation of this paper.I still have two questions, please help me to answer them. There is a function compute_auc_average in the evaluate. py file with the following code: `for sub_loss_file in loss_file_list:

the name of dataset, loss, and ground truth

    dataset, psnr_records, gt = load_psnr_gt(loss_file=sub_loss_file)
    if dataset=='shanghaitech':
        gt[51][5]=0
    elif dataset=='ped2':
        for i in range(7,11):
            gt[i][0]=0
    elif dataset=='ped1':
        gt[13][0]=0
    # the number of videos
    num_videos = len(psnr_records)`

I don't quite understand the ground truth corresponding to different data sets is set to 0 in different places. Could you please help me explain it?

  1. Why are they set to 0 in these places?
  2. How is the corresponding place of this different data set obtained?
  3. How can I know to set 0 in the corresponding place of ground truth on my own data set?

In addition, I look at the pre-training model on the coco data set you directly use. If I retrain a detection model of ssd-resnet50 on the experimental data set, the detected box will be more accurate, and the experimental effect will be better? I'm sorry to bother you again and again. I sincerely appreciate your help and wish you a happy life.

fjchange commented 5 years ago
  1. The code you showed is a stupid way to avoid nan in calculating the AUC of a video that every frame annotated as abnormaly. That doesn't have any trick and do little influence to the result. You can use the original annotation, which doesn't have this problem. This part of code is use to calculate the AUC as the author did, which is not reasonable.
  2. I can assure that better object detector can get a better result. However, there is no object annotation in all abnormaly detection dataset, maybe you can do domain adaptation work. It should be better. Considering the speed, this paper use ssd-resnet50 to reduce calculation to make it possible to work on-line. If you don't care about the speed, a heavier but better detector can lead to nicer result, especially for the Avenue dataset.
  3. I would spend some time this two days to rewrite the last two stages, clustering and SVM in matlab as the author did. Thanks for you attention on my work, you are welcome.
amirmk89 commented 5 years ago

Thank you both for your work and useful insights. @fjchange , what results are you getting using the reimplementation when scoring as Liu et al.?

fjchange commented 5 years ago
I have tested the author's result as Liu et al. as below. ( He sent me the anomaly_scores.txt in email ) dataset AUC paper said gap
Avenue 86.56% 90.4% -3.84%
ShanghaiTech 78.5645% 84.9% -6.3356%

They have comfirmed that.

amirmk89 commented 5 years ago

Thank you, and what are the best results you are able to achieve using your reimplementation? both calculated as the authors and as Liu et al.?

fjchange commented 5 years ago

At my bst, as the authors,

xiadingZ commented 5 years ago

I follow your steps using newest code, but can't reproduce your auc on shanghaitech, all using default param, not on matlab. Have you tested your code on newest version?

amirmk89 commented 5 years ago

At my bst, as the authors,

  • Avenue 64.11% (only 5500 object detected in training part)
  • ped2 86.51%
  • ShanghaiTech 80.35% No norm, Smoothed (the param may change), no class add ( but the author say there should be), not on matlab.

Thank you again, but properly scoring as Liu et al, for the score the authors claim they got 78.56 AuC, what is your result?

xiadingZ commented 5 years ago

when i train model on shanghaitech, three stream's loss are about 0.0010~0.0014, are they correct?

AndyHon commented 5 years ago

Hello, I found a problem today. Since my detection effect was not good, I redrew the rectangular box according to the coordinates of box, and found that the box did not correspond to the object. In line 133 of test.py, box=[int(box[0] * image_height), int(box[1] * image_height), int(box[2] * image_height), int(box[3] * image_width)] But I have some questions about this.The box coordinates of the object detection with SSD are upper left and lower right, So i think box=[int(box[0] * image_width), int(box[1] * image_height), int(box[2] * image_width), int(box[3] * image_height )] I look forward to your valuable Suggestions.

AndyHon commented 5 years ago

In util.py, `def box_image_crop(image_path,box,target_size=64): image=cv2.imread(image_path,0) box=[box[0],box[1],box[0]+box[2],box[1]+box[3]] crop_image=image[box[0]:box[2],box[1]:box[3]] crop_image=cv2.resize(crop_image,dsize=(target_size,target_size)) crop_image=np.array(crop_image).reshape((target_size, target_size, 1)).astype(np.float32)/255.0

return crop_image`

Why do you scale the image size to 64 here?Is the coordinate position in the upper left corner with width and hight in SSD?

fjchange commented 5 years ago

@AndyHon 1. Using tensorboard can help you to know about the box.

  1. 64*64 is designed by the author in the paper.
fanzijuan0625 commented 4 years ago

@fjchange Thank you for you job. I have a question about gradients. In the paper, "For each object, we obtain two image gradients, one representing the change in motion from frame t−3 to frame t and one representing the change in motion from frame t to frame t + 3." But in your code, the gradients are only calculated by the frame t-3 and t+3, is it right?

horizonly commented 3 years ago

hi, I have some problem in improve the result, can you share your code for my reference, thank you!

lss0510 commented 3 years ago

I really appreciate your guidance. I really admire your successful implementation of this paper.I still have two questions, please help me to answer them. There is a function compute_auc_average in the evaluate. py file with the following code: for sub_loss_file in loss_file_list: # the name of dataset, loss, and ground truth dataset, psnr_records, gt = load_psnr_gt(loss_file=sub_loss_file) if dataset=='shanghaitech': gt[51][5]=0 elif dataset=='ped2': for i in range(7,11): gt[i][0]=0 elif dataset=='ped1': gt[13][0]=0 # the number of videos num_videos = len(psnr_records) I don't quite understand the ground truth corresponding to different data sets is set to 0 in different places. Could you please help me explain it?

  1. Why are they set to 0 in these places?
  2. How is the corresponding place of this different data set obtained?
  3. How can I know to set 0 in the corresponding place of ground truth on my own data set?

In addition, I look at the pre-training model on the coco data set you directly use. If I retrain a detection model of ssd-resnet50 on the experimental data set, the detected box will be more accurate, and the experimental effect will be better? I'm sorry to bother you again and again. I sincerely appreciate your help and wish you a happy life.

Hello, what's your final AUC on avenue?