Closed ZeqiMao closed 3 years ago
Hello Program-Kitty,
Thank you for your interest in the DALI dataset. I'm sorry if the doc wasn't clear enough. I'm preparing a PR that hopefully will clarify some vague explanations.
The dali_code.get_info
is meant to be use with a dali_info.gz
file which I forgot to add in the zenodo repo for the second version. However, you can easily create it by yourself as follow:
dali_info = [['DALI_ID', 'NAME', 'YOUTUBE', 'WORKING']]
for i in dali_data:
uid = dali_data[i].info['id']
name = "-".join([dali_data[i].info['artist'].replace(" ", "_"), dali_data[i].info['title'].replace(" ", "_")])
youtube = dali_data[i].info['audio']['url']
working = ''
dali_info.append([uid, name, youtube, working])
And then just call the function as normal:
errors = dali_code.get_audio(dali_info, path_audio, skip=[], keep=[])
This should work. Please let me know if you have any trouble or further issues.
Best regards, Gabriel
Hi gabolsgabs! Thank you for your response -- I solved the issue following your instruction. I loaded dali_data (containing 7756 entries) following the instruction in tutorial. But when I tried to load the ground-truth file, I came across this issue:
path = '/ME/My Drive/Colab Notebooks/DALI-master/'
gt_file = path + 'gt_v1.0_22_11_18.gz'
gt = dali_code.utilities.read_gzip(gt_file)
dali_gt = dali_code.get_the_DALI_dataset(dali_data_path, gt_file, keep=gt.keys())
KeyError Traceback (most recent call last)
in () ----> 1 dali_data = dali_code.update_with_ground_truth(dali_data, gt_file) /usr/local/lib/python3.6/dist-packages/DALI/main.py in update_with_ground_truth(dali, gt_file) 64 if len(gt) > 0: 65 for i in gt: ---> 66 entry = dali[i] 67 change_time(entry, gt[i]['offset'], gt[i]['fr']) 68 entry.info['ground-truth'] = True KeyError: '557037a547e84ddba8148c137eee0eb5'
I checked our dataset and didn't find this key in dali_data. Could you please check if this is the case on your end? I wonder if there's any issue with our ground truth file...
The ground-truth file does not work for version 2. The ids are different and the alignment may be also different. If you plan to use the ground-truth (which only refers to the right offset and frame rate parameters) please use version 1.
Hi! Thank you for this comprehensive dataset! This might be a really stupid question but I have trouble getting the audio file. I use google colab. I followed the tutorial and loaded the first 2 .gz file using the following code as a demo:
(Got output <DALI.Annotations.Annotations at 0x7f9925c4c160>)
Here I got error message "TypeError: 'Annotations' object is not subscriptable". Plus if I try to print(dali_info[0]), the same error message pops up.
Could you please tell me if there's anything wrong with my data loading?
Besides, I noticed a lot of youtube link is shows "working: False"...I'm not sure if this would affect data loading. Shall I submit request for an updated version of data?
the output looks like {'artist': 'Janis Ian', 'audio': {'path': 'DALI_v2.0/audio/0a1a15671536498f8a856da781c017d7.mp3', 'url': 'iepedfdjA80', 'working': False},...
Thank you in advance for your help.