felixgontier / dcase-2023-baseline

14 stars 6 forks source link

Task 6A wrong csv #1

Closed asicoderOfficial closed 1 year ago

asicoderOfficial commented 1 year ago

The csv file _clotho_captionsdevelopment.csv contains some entries with trailing spaces, such as [ typical neighborhood in Porto.wav], which should be [typical neighborhood in Porto.wav], without the first space.

To solve it, preprocess the file like:

import pandas as pd

df = pd.read_csv('data/clotho_captions_development.csv')
df['file_name'] = df['file_name'].str.strip()
df.to_csv('data/clotho_captions_development.csv', index=False)

Please upload the csv corrected and close this issue.

felixgontier commented 1 year ago

Hello, After double checking, the 3 audio files in question also contain a leading space, so clothocaptions.csv files do correctly match the audios. For context these names are unchanged from Freesound where they were collected. I agree that this naming is poor practice, and is not always handled well by file systems/libraries. Since it hasn't particularly been a problem for participants in the past, it's unlikely that there will be a complete reupload of the dataset on Zenodo fixing this (note that I am not part of the team that curated the current version, or the owner of the repository). However we'll keep in mind to include it in any future update of Clotho.

asicoderOfficial commented 1 year ago

Thanks for the response! I close the issue, everything clear now.