NaNoGenMo / 2021

National Novel Generation Month, 2021 edition.
44 stars 8 forks source link

feature request: add Japanese #45

Open micuat opened 2 years ago

micuat commented 2 years ago

I'd like to do participate but I don't write in English so I'd like to do something in Japanese (break the word count).

enkiv2 commented 2 years ago

We have had a lot of entries in languages other than English, but none in Japanese that I know of. It should be exciting!

Do you know how you're going to count words?

On Sat, Nov 6, 2021, 9:49 AM Naoto HIÉDA @.***> wrote:

I'd like to do participate but I don't write in English so I'd like to do something in Japanese (break the word count).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2021/issues/45, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXUGLUJXJAMPT5K7SM3S3UKUXAHANCNFSM5HPUF7XA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

micuat commented 2 years ago

I'm super behind but trying to catch up with the research... since I use javascript (I don't want to use python) I might remix a project like this https://github.com/kylestetz/metaphorpsum/blob/master/routes/index.js#L170 The way of using template sentences would be an easy start for conversion between English and Japanese

micuat commented 2 years ago

some weird things I tried: https://github.com/micuat/metaphorpsum/tree/nngm

en: 'In recent years, a pump is a seedy twist. This could be, or perhaps a cheese is a pudgy Sunday. If this was somewhat unclear, the first glary clave is, in its own way, a lotion. However, a postage is a dimming title. ', ja: "近年、 パンプス(ひもや金具がなく,甲のあいた靴)は (果物などが)種の多い 〈糸・なわなど〉‘を'『よる』,より合わせる(糸・なわなどに)…‘を'よる《+名+into(in)+名》 である。恐らく、強いて言えば 『チーズ』は 小さくてずんぐりした 『日曜日』(キリスト教の安息日で週の第1日;《略》Sun.)である。それが不明瞭であれば、初めてのGLアRY cleaveの過去形は、ある意味では 外用水薬;化粧水,ローションだ。しかし、 『郵便料金』,郵送料は 『薄暗い』,ほの暗い 〈C〉(…の)『題名』,題目,題《+of(to)+名》 である。"

First I modified metaphorpsum to be able to simply output a random text on the console. Then I added Japanese translation to the template sentences. By overriding actions of Sentencer, random nouns/adjectives are stored on the stack, translated into Japanese using ejdict.

Challenges:

Next steps:

micuat commented 2 years ago

here are my (close to final) results: english | japanese

I looked into the English-Japanese dictionary (ejdict) further. The output of ejdict looks like this

make
----
 …‘を'『作る』,製造する,建造する
 …‘を'『整える』,用意する
 …‘を'生じさせる,もたらす,引き起こす
 〈金など〉‘を'得る,もうける,〈財産など〉‘を'作る
 《行為・動作を表す名詞を目的語にして》…‘を'『する』,行う
 (ある状態・形態に)…‘を'『する』
 《『make』+『名』+do》〈人・動物など〉‘に'強制して(…)させる

since it's very cluttered and difficult to simply replace an English word with the output of ejdict, I started writing regular expressions to clean it up

https://github.com/micuat/metaphorpsum/blob/8f4d502330ae284fdfeabb0d92a2fd260f0e91a8/app.js#L183-L202

I spent an hour or so with regular expressions (and the result is still not perfect). Then I thought, what if I make a feature vector of an English word based on this process - e.g., if the text contains turn on a flag, and another flag for 《.*》 - which effectively represents how cluttered the word is in an English-Japanese dictionary (since I read an issue about word2vec on https://github.com/ml5js/ml5-library/issues/1238 I was looking for an alternative way to find words). This is how the program chooses a word; it simply stores the last word's feature vector, randomly picks a few words into a pool, and finds the word that has the closest feature vector. Every chapter I increased the size of the pool, so I expect that the first chapter looks more random, and the later chapters should have similar words based on how cluttered the word is in the E-J dictionary (note that only nouns/adjectives are randomized in the sentences and the rest is based on the template).

Currently the amount of sentence templates are very small so you can see a lot of repetitions - I might work on it but it won't be the core of the project. Now I think my interest is, as a Japanese, since we are asked to look up dictionaries a lot as most of the English education in Japan is based on reading, how it shapes Japanese people's competency in English and how I can intervene it.

micuat commented 2 years ago

for now here are the final results, adding more sentence templates

english | japanese