Closed chrisemezue closed 2 years ago
I had a quick look at the review document, are the following words easily pronounceable by most people? Right now the error rate is 0%, which to me would be very astonishing to be honest. Due to words that hard to pronounce, and some other reasons, we're allowing a certain percentage of error. Haven't seen 0 yet though :)
Also, I've noticed some Chinese characters: Akpaala okwu ahụ bụ 沉魚落雁,閉月羞花.
I think this is a good example on why generally allowed_symbols
should be preferred. Excluding all Chinese symbols would not work out well.
Finally, the reviews are in. See them here. @MichaelKohler @stefangrotz #160 #160
I'm so Sorry I mistakenly closed it. I added this comment to reopen.
Thanks for the review. I'm a bit confused though, as all sentences say "OK" but there are errors reported by the reviewers on the right side of the sheet. How did that work?
For example, on row 16 there is a sentence with "(Ph.D)" which should not be included as valid sentence as it's an abbreviation. Did that count as an error?
Thanks for the review. I'm a bit confused though, as all sentences say "OK" but there are errors reported by the reviewers on the right side of the sheet. How did that work?
For example, on row 16 there is a sentence with "(Ph.D)" which should not be included as valid sentence as it's an abbreviation. Did that count as an error?
coma
with all English letters is acceptable. So also is koma
. Another example is Nigeria
, which also has Naigeria
, Naijiria
. So the comment is just the reviewer highlighting that there are other variants. That's probably why it was left as OK
because the original is Okay.Dọkịta nke Nkà Ihe Ọmụma (Ph.D.) mmemme bụ isi site na nyocha.
with PhD (which as I checked is the only example with abbreviation), the Dọkịta nke Nkà Ihe Ọmụma
is actually the Igbo meaning of PhD. So it's like a way of saying World health organization (WHO)
where you say the full meaning and then the abbreviation. Also, I understood that the reason why abbreviations are not accepted is due to the ambiguity in pronouncing them (for example ICE which could be I.C.E or ice). But in pronouncing words like PhD in Igbo it is always done letter by letter. Notwithstanding, I shall correct it to take the PhD as an error.Understood. I think this is okay then. Thanks for all the effort!
How many sentences did you get at the end?
After a successful run and tweaks from the original outputs, as well as help from @MichaelKohler and @stefangrotz we got this code generates 2165 sentences of a good accuracy upon review.
Here are all the steps I took.
How did you create the blocklist file?
Based on the guidelines provided in the readme section and filtering out words with a frequency of 20 and less. The frequency of 80 reduced our sentence to ~400. We checked and found that 20 was also giving us good sentences. I did not tweak the ig.toml file much.
Results from the reviews Spreadsheet of 800 random sentences has been reviewed. Results of review are here.