common-voice / sentence-collector

Tool to collect and review sentences for Common Voice
https://commonvoice.mozilla.org/sentence-collector/
Mozilla Public License 2.0
81 stars 64 forks source link

The API doesn't find the sentence in Thai #634

Closed wannaphong closed 1 year ago

wannaphong commented 1 year ago

Describe the bug I found the api /sentences/{localeId} doesn't found the sentence in Thai. I try to use some sentence in Thai but It doesn' found.

The list that I doesn't found the sentence: https://gist.github.com/wannaphong/31e9d34173a7a826d14910b11e66385e

To Reproduce Steps to reproduce the behavior:

  1. copy "คุณจะทำอะไรกับเงินในกระเป๋าของคุณเดือนนี้ ?" from https://raw.githubusercontent.com/common-voice/common-voice/main/server/data/th/sentence-collector.txt
  2. paste in the api box (/sentences/{localeId}) https://commonvoice.mozilla.org/sentence-collector/api/ ภาพ
  3. run

Expected behavior The API should get the sentence.

Desktop or Mobile (please complete the following information):

Additional context Add any other context about the problem here.

MichaelKohler commented 1 year ago

There are certain transformations done when exporting: https://github.com/common-voice/sentence-collector/blob/main/server/lib/cleanup/languages/th.js#L36

As you wrote on Matrix, the sentence in Sentence Collector doesn't have the space before the question mark.

wannaphong commented 1 year ago

I think the sentence should can mapping with the Sentence Collector API. Now, the sentence still can't mapping by text-to-text. I think commonvoice should update the sentence that can mapping with the Sentence Collector API.

wannaphong commented 1 year ago

@MichaelKohler I think this issues still doesn't close because the sentence can't mapping with API by sentence-to-sentence. Not All developer that know the problem and use JS.

MichaelKohler commented 1 year ago

I started a thread on Discourse about this: https://discourse.mozilla.org/t/sentence-collector-cleanup-before-export-vs-cleanup-on-upload/105411

MichaelKohler commented 1 year ago

Thanks for bringing this up. The Sentence Collector now has moved to https://commonvoice.mozilla.org/write and therefore is now hosted in the main Common Voice repository. That being said, this issue is no longer relevant as that part of the code has been rewritten.