Closed Guybrush88 closed 1 year ago
This isn't a bug. That file is not part of the weekly export.
I asked TRANG for this information and she generated the file for me and put it here.
That said, I would love to see this updated from time to time, even if not every week.
As CK said that file isn’t part of the weekly export.
@Guybrush88 Do you want that file to be included in the weekly export?
@Guybrush88 Do you want that file to be included in the weekly export?
In my opinion, that could be a relevant info for people using Tatoeba's data for external websites, so I would personally use also such files with regular updates, if I'd like to properly reuse audio, but I guess other opinions are welcome to find a proper and more effective solution.
If including this info is for people using Tatoeba's data for external websites, then include the licensing for each file might be a good idea. That info is already in the sentences_with_audio.tar.bz2 file.
Maybe just adding the dates into that file would accomplish what Guybrush88 desires.
That would be useful information for me, too.
As CK mentioned, sentences_with_audio.csv
already has enough data to properly give attribution. The only information in sentences_with_audio_and_date.csv
that is not already in sentences_with_audio.csv
is the creation time and last modification time. However this data is not very accurate. All audio that was uploaded before #1378 got merged have a date set to zero, which accounts for about 30% of all the files we have now. Between #1378 and #2880, disabling an audio used to reset the date, and I know a number of disabling/enabling happened in order to temporarily allow editing sentences. I think the mp3 file last modification date could be a much better indicator, but we do not export it at the moment.
Because of this, I suggest I just remove that file and close this ticket.
What I had actually asked TRANG for at the time she created this file was directory listings with the dates on the files. Assuming, files haven't had their dates changed when moving them around, then those dates are maybe more likely to be more accurate for what I wanted.
I wanted to know this, so I could more quickly see which of my own audio files I might want to listen to and consider re-recording.
I would love to get such directory listings now if that's something you could do for me. I'm primarily interested in only the English audio files, but I could likely make use of a complete listing of all files.
I think closing this ticket would be OK. I think TRANG just put the file I requested in that directory for me with the intention of not leaving it there.
I removed these two files sentences_with_audio_and_date.tar.bz2
and sentences_with_audio_and_date.csv
.
To Reproduce I was browsing this page to see the exported files' list: https://downloads.tatoeba.org/exports/ and I noticed that the ones containing the list of sentences with audio were last updated on 10-Oct-2019
Expected behavior The files containing the list of sentences with audio and their date should be updated weekly among with the other exported files.