YapengTian / AVE-ECCV18

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
https://sites.google.com/view/audiovisualresearch
170 stars 31 forks source link

Failed to download audio_feature.h5 #20

Closed asker-github closed 3 years ago

asker-github commented 4 years ago

Does anyone have the link of Chinese Baidu disk or thunder of audio_feature.h5? I can only download it with Google browser. Because it's too big, I fail every time.

asker-github commented 4 years ago

I try to make audio_feature.h5 myself, but I don't know if it will have any bad effect.

YapengTian commented 4 years ago

I uploaded it to Dropbox. Here is the link: https://www.dropbox.com/s/djweo9ew9pqv8xi/audio_feature.h5?dl=0.

asker-github commented 4 years ago

I uploaded it to Dropbox. Here is the link: https://www.dropbox.com/s/djweo9ew9pqv8xi/audio_feature.h5?dl=0.

Oops, that's my problem. It should be visual_feature.h5. I was so excited that I typed the wrong file name.

asker-github commented 4 years ago

I tried to download your link. The speed should be similar to the links in readme. Because I'm using chrome to download, even if you upload to Dropbox, I may still fail to download. Every time I download half of it, it will fail. Maybe the network is not very good. haha. Maybe I have to generate the file myself. thank you.

asker-github commented 4 years ago

The file size I generated is 8.3G. It is generated according to the video name in each line of the Annotations.txt. But you provide 7.7g, I don't know what the difference is.

YapengTian commented 4 years ago

If you used the provided scripts and followed the order of Annotations.txt, it should be correct.

asker-github commented 4 years ago

If you used the provided scripts and followed the order of Annotations.txt, it should be correct.

Hello, my torch version is 1.5.1.When I tested(python supervised_main.py --model_name AV_att), this error occurred. Traceback (most recent call last): File "supervised_main.py", line 159, in test(args) File "supervised_main.py", line 148, in test x_labels = model(audio_inputs, video_inputs) File "/home/zhu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/zhu/zhu_tf/audio_visual/AVE-ECCV18-master/models.py", line 66, in forward self.lstm_video.flatten_parameters() File "/home/zhu/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 106, in flatten_parameters if len(self._flat_weights) != len(self._flat_weights_names): File "/home/zhu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 594, in getattr type(self).name, name)) AttributeError: 'LSTM' object has no attribute '_flat_weights'` I'm trying to fix this mistake right now. I want to know if I can continue to train or test on the model you provided if I solve this error.

YapengTian commented 4 years ago

I was using 0.30. If you run it using 1.5.1, I think you need to modify code accordingly.

asker-github commented 4 years ago

I was using 0.30. If you run it using 1.5.1, I think you need to modify code accordingly.

Hello, first of all, thank you for your kind reply.I have two questions for you. ^_^

weak supervised main.py :visual feature noisy.h5、audio feature noisy.h5、mil labels.h5、labels noisy.h5、 In addition, I don't know whether these documents correspond to Annotations.txt. Because I want to study several other classes.

cmm train.py :labels closs.h5,visual feature vec.h5,train order match.h5,val order match.h5,test order match.h5 Besides, I have no idea what these documents are. visual feature vec.h5 is not available for download, and I'm also upset. Looking forward to your reply, thank you!

YapengTian commented 4 years ago

The noisy features are from some randomly selected videos which are in the background class. They do not correspond to Annotations.txt. The videos can be found https://drive.google.com/file/d/1Iqba9lk_KOxxf5CFV33_XVoC5nuG8wiu/. The mil_labels are video-level labels.

As given in the Readme, visual feature vec.h5 can be downloaded from https://drive.google.com/file/d/1l-c8Kpr5SZ37h-NpL7o9u8YXBNVlX_Si/view. labels closs.h5 contains labels for the contrastive loss. visual feature_ vec.h5 contains visual features. The other three are data splitting orders.

asker-github commented 4 years ago

I retrain the Male speech, Female speech and background on your supervised tasks. The accuracy is about the same as yours, but the effect of finding the sounding part in the picture is very bad (python attention_visualization.py). How can I achieve the effect in your paper?

YapengTian commented 3 years ago

I used data from different categories to train the model before. Since you only use the limited speech data, it is reasonable that the model fails to find the sounding parts for objects in other categories.

YapengTian commented 3 years ago

If you only want to explore face-speech data, you might train the model on a large set with only human talking videos such as active speaker detection dataset: https://arxiv.org/abs/1901.01342.

asker-github commented 3 years ago

I used data from different categories to train the model before. Since you only use the limited speech data, it is reasonable that the model fails to find the sounding parts for objects in other categories.

I'm just training these three categories. I just want to identify these three categories. But the effect is not good. Maybe it's because there are fewer types of training. Thank you for your recommendation.