MorenoLaQuatra / audiocaps-download

This package aims at simplifying the download of the AudioCaps dataset.
30 stars 4 forks source link

directly using yt_dlp to chunk audio clips could cause unalignment issues #3

Closed haidog-yaqub closed 10 months ago

haidog-yaqub commented 10 months ago

Just found this download method could cause unalignment problem. For example, the start time in the meta is 5 second, but the actual downloaded audio starts from 10 second. This will also cause the length of some downloaded audio to be far less than 10 seconds. I suggest directly downloading the entire audio by yt_dlp and chunking desired clips locally.

MorenoLaQuatra commented 10 months ago

Thank you for opening the issue.

Are you referring to the following line? https://github.com/MorenoLaQuatra/audiocaps-download/blob/dbe83b56ef97fc82143e78d99614ec30cb0135e7/audiocaps_download/Downloader.py#L295

How do you discovered the bug?

haidog-yaqub commented 10 months ago

Thank you for opening the issue.

Are you referring to the following line?

https://github.com/MorenoLaQuatra/audiocaps-download/blob/dbe83b56ef97fc82143e78d99614ec30cb0135e7/audiocaps_download/Downloader.py#L295

How do you discovered the bug?

Yes, I checked the length of data downloaded by your repo and found some of them are very short, so I manually checked them and found the problem. I also tried yt-dlp in command line, but the issue still exists. It should be an yt-dlp bug.

haidog-yaqub commented 10 months ago

The way I solved it is to download entire audio and then chunk it by other tools.

MorenoLaQuatra commented 10 months ago

Thank you so much to raise the issue then. Can you suggest any other tool to use? Does downloading everything from yt-dlp and then manually cut (let's say, with torchaudio) solve the issue in your opinion?

haidog-yaqub commented 10 months ago

Thank you so much to raise the issue then. Can you suggest any other tool to use? Does downloading everything from yt-dlp and then manually cut (let's say, with torchaudio) solve the issue in your opinion?

Yes, downloading entire audio and cutting it by torchaudio works.

MorenoLaQuatra commented 10 months ago

Thank you again. Just to verify if I'm correctly identifying the problem, do you think this code will solve the issue?

        # Download the file using yt-dlp
        # os.system(f'yt-dlp -x --audio-format {self.format} --audio-quality {self.quality} --output "{target_file_path}" --postprocessor-args "-ss {start_seconds} -to {end_seconds}" https://www.youtube.com/watch?v={ytid}')
        # Download the ENTIRE audio file
        os.system(f'yt-dlp -x --audio-format {self.format} --audio-quality {self.quality} --output "{target_file_path}" https://www.youtube.com/watch?v={ytid}')
        # now manually cut the audio file
        try:
            waveform, sample_rate = torchaudio.load(target_file_path)
            waveform = waveform[:, int(start_seconds * sample_rate):int(end_seconds * sample_rate)]
            torchaudio.save(target_file_path, waveform, sample_rate)
        except Exception as e:
            print('Error loading audio file: ', target_file_path)
            print(e)
            # delete file if it exists
            if os.path.isfile(target_file_path):
                # delete file
                os.remove(target_file_path)

I did an automated check of a batch of 1000 files and the duration seems to be correct.

haidog-yaqub commented 10 months ago

Thank you again. Just to verify if I'm correctly identifying the problem, do you think this code will solve the issue?

        # Download the file using yt-dlp
        # os.system(f'yt-dlp -x --audio-format {self.format} --audio-quality {self.quality} --output "{target_file_path}" --postprocessor-args "-ss {start_seconds} -to {end_seconds}" https://www.youtube.com/watch?v={ytid}')
        # Download the ENTIRE audio file
        os.system(f'yt-dlp -x --audio-format {self.format} --audio-quality {self.quality} --output "{target_file_path}" https://www.youtube.com/watch?v={ytid}')
        # now manually cut the audio file
        try:
            waveform, sample_rate = torchaudio.load(target_file_path)
            waveform = waveform[:, int(start_seconds * sample_rate):int(end_seconds * sample_rate)]
            torchaudio.save(target_file_path, waveform, sample_rate)
        except Exception as e:
            print('Error loading audio file: ', target_file_path)
            print(e)
            # delete file if it exists
            if os.path.isfile(target_file_path):
                # delete file
                os.remove(target_file_path)

I did an automated check of a batch of 1000 files and the duration seems to be correct.

Yes, I think it should work. You can compare with previous data, especially those short ones.

MorenoLaQuatra commented 10 months ago

I will close the issue, open it again if something else is missing.