ExplainableML / TCAF-GZSL

This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"
MIT License
24 stars 0 forks source link

ActivityNet and VGGsound dataset error #5

Closed Qichen98fly closed 1 year ago

Qichen98fly commented 2 years ago

Thanks for our previous email.

I use original TCAF code and pre-extracted feature, but there are still errors for ActivityNet and VGGSound dataset (UCF101 can be run, and I download A&Vgg features a few times again). So could you please run your uploaded feature again to double check and upload a new version of features? Screenshot from 2022-11-02 16-53-41 Screenshot from 2022-11-02 16-51-47

MerceaOtniel commented 2 years ago

So basically you have audio and video folders, which are split into stage* folders. Each stage folder contains some .pkl files and each .pkl represents a class. The error says that the number or names of .pkl files in one of these stage_ folders do not match between the audio and video. Moreover, from the above pictures it seems that for VGGSound it fails in the stage_2test*. I have checked today for VGGSound again and I didn't have any error and was able to generate the features. Moreover, I re-downloaded the dataset and looked manually through the file names, and again it seems that they match between audio and video.

I would advise you to first look in the audio and video folders for VGGSound and see if the names of the files there match and if you have the exact same number of files in audio and video. Moreover, you could also use a debugger and see exactly what files it load for each of the splits.

The same applies for ActivityNet, but from the image above it seems that this one fails during stage_1val*.

Qichen98fly commented 2 years ago

Thanks for your reply.

I think the problem actually happen in text label or extracted feature. For the reported error in my last message, I use the original code downloaded from this github and use the original feature downloaded from your given link.

Now, I add some 'print' code to read_dataset at VGGSound_dataset:

def read_dataset(self, dataset_type): result_audio = self.get_data_by_modality(modality="audio", dataset_type=dataset_type) result_video = self.get_data_by_modality(modality="video", dataset_type=dataset_type) torch.set_printoptions(profile="full") print("audio_shape:",result_audio["target"].shape, "video_shape:",result_video["target"].shape)

    result_c = (result_audio["target"]-result_video["target"])>0.5
    print("result_c",result_c)
    print("nonzero:", torch.nonzero(result_c))
    assert torch.equal(result_audio["target"], result_video["target"])

I found that some audio labels and video labels are exactly dismatched:

nonzero: tensor([[2499], [2500], [2501], [2502], [2503], [4785], [4786], [4787], [4788], [4789]]) Traceback (most recent call last): File "main.py", line 277, in run() File "main.py", line 30, in run path_stage_1, best_epoch = main(args) File "main.py", line 89, in main train_dataset = VGGSoundDataset( File "/OSM/CBR/D61_RCV/students/zhe031/TCAF-GZSL_1/src/dataset.py", line 195, in init self.preprocess() File "/OSM/CBR/D61_RCV/students/zhe031/TCAF-GZSL_1/src/dataset.py", line 222, in preprocess test_set = self.read_dataset(dataset_type="test") File "/OSM/CBR/D61_RCV/students/zhe031/TCAF-GZSL_1/src/dataset.py", line 268, in read_dataset assert torch.equal(result_audio["target"], result_video["target"]) AssertionError

Note that the file name and number between audio and video look fine if only looking manually, as I download the same thing from your link. And I meet this error by downloading the feature for multiple times.

'nonzero tensor([[2499], [2500], [2501], [2502], [2503], [4785], [4786], [4787], [4788], [4789]])' here is to identify which result_audio["target"] & result_video["target"] are not matched in VGGSound dataset. As your audio and video are loaded through the same txt or csv file, I think the problem may occur at the pre-extractd feature. 1) DO you think my guess is correct?

2) Additionally, do you comment the 'assert torch.equal(result_audio["target"], result_video["target"])' code here when you run the program?

3) If my guess is correct and you didn't comment 'assert.....' code when you run the program, would you mind if you can upload the VGGSound and ActivityNet feature that you didn't meet any error when you run the program , please?

hummelth commented 2 years ago

We tried to replicate your error but we are not able to. Please make sure that you exactly follow the steps below. We freshly cloned the repository and downloaded the features from the provided link. The issue is therefore not caused by our features or code. For suggestion, you can try to delete the avgzsl_benchmark_non_averaged_datasets/VGGSound/_features_processed/ folder to rule out that any previous (incomplete) extraction is causing this. Also make sure that your system has enough system memory to finish the extraction. Additionally, the error could also maybe be caused by accidentally mixing files from different datasets? I would suggest to create a fresh clone of the repository to rule this out. For your other question, we don't comment out any portions of the code or assert-statements, but run the code as is.

1) Clone the repository 2) Install and activate the conda environment Assuming you are in the root directory of the project: 3) Download the data: wget https://s3.mlcloud.uni-tuebingen.de/tcaf-gzsl/vggsound-supervised-temporal.zip 4) unzip it to the avgzsl_benchmark_non_averaged_datasets folder: unzip vggsound-supervised-temporal.zip -d avgzsl_benchmark_non_averaged_datasets/ 5) python3 main.py --cfg config/best/best_cls/best_vggsound.yaml --root_dir avgzsl_benchmark_non_averaged_datasets/VGGSound/ --log_dir logs --dataset_name VGGSound --run all