Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
679 stars 131 forks source link

Korean audio recordings are unavailable (but files still present) #3062

Closed Yorwba closed 1 year ago

Yorwba commented 1 year ago

I noticed that Korean is no longer listed on the audio index page and the exported sentences_with_audio file has no recordings for sentences in Korean either.

But I remember clearly that we used to have recordings in Korean and indeed the /var/www-audio/sentences/kor/ directory on the server contains 477 .mp3 files.

The Internet Archive has a snapshot from 2022-08-10 with Korean audio still present and another snapshot from 2023-01-27 without. The date range would seem to rule out the database migrations in #2880 and #3037 as causes.

That leaves me with 2 possible explanations I can think of:

  1. The user who contributed the recordings asked for them to be deleted. Then it's not a bug they're not available anymore, but arguably a bug that they're still present on the server.
  2. We have some kind of database corruption.
ckjpn commented 1 year ago

audio - kor - by pon00050

He wrote and told us that he wasn't a native Korean speaker and asked to have the audio files removed.

Yorwba commented 1 year ago

That explains why there are 477 files on the server, even though the 2022-08-10 archive.org snapshot shows 292 recordings.

What about audio - kor - by Eunhee?

ckjpn commented 1 year ago

I don't know. I guess Gillux may need to take care of those.

ckjpn commented 1 year ago

Note:

English

https://tatoeba.org/en/sentences/show/3756600 The old standard link for this one works. https://audio.tatoeba.org/sentences/eng/3756600.mp3

Korean

https://tatoeba.org/en/sentences/show/8365857 But this one doesn't. https://audio.tatoeba.org/sentences/kor/8365857.mp3

jiru commented 1 year ago

I am asking Eunhee.

jiru commented 1 year ago

About Eunhee’s audios.

Snapshot from September, 29th shows audio was already missing at that time. That further reduces the date of disappearance to somewhere between Aug. 10 and Sep. 29, 2022.

jiru commented 1 year ago

Eunhee replied me that she explicitly asked for her audio contributions to be removed.

ckjpn commented 1 year ago

I deleted audio - kor - by Eunhee, so it won't be confusing in the future.