Toki Pona Book written by a machine and interpreted by humans

dariusk / NaNoGenMo

National Novel Generation Month. Because.

184 stars 16 forks source link

Toki Pona Book written by a machine and interpreted by humans #73

Open lilinx opened 10 years ago

lilinx commented 10 years ago

In progress.

The basic idea here is to generate a random text in a constructed language, then ask humans to translate it in a natural language thus bringing meaning into it.

I'm now experimenting with wiki hosting sites, to find the right place where I could host such a thing. I created these two wikis where login is not required to contribute :

Oddwiki : http://www.oddwiki.org/odd/OddList/sitelenpinimimute Wikia : http://sitelen-pi-nimi-mute.wikia.com/wiki/Wiki_Content

The idea is first, to generate a 50k words Markov-chain in Toki Pona.

Toki Pona is an extremely fun constructed language that works with exactly 123 words. You can learn the basics in minutes.

Toki Pona Markov chain is very syntax-consistent. Because of the very flexible syntax of toki pona, most of the words belong to all the part-of-speech at the same time It would also be very easy to design a parsing script that would erase the few uncorrect sentences in the text. But I also like the idea of having few mistakes in the novel.

. Because of the restrictive vocabulary, any sentence written in Toki Pona can have multiple interpretations.

The idea is to generate the book, possibly make a syntax check on it, then publish it online and ask everybody (I mean...the international toki pona speakers community...) to contribute to its apophenic "translation".

People could contribute over the same interpretations, writing all together a consistent novel, or on the opposite they could fight over the meaning of the generated text and we could see complete different stories emerge from the same original source.

Just wondering if anybody has suggestions on how to organize this.

enkiv2 commented 10 years ago

I'm glad somebody is doing the conlang thing, finally (although I was expecting somebody to just up and generate grammatically correct lojban and use one of the existing automatic translators to produce broken english).

Could you automate the translation by choosing arbitrary part-of-speech interpretations and then choosing arbitrary translations of the words based on the assumption that they fall into those parts of speech? Toki Pona may be a little too fuzzy to try to do that.

On Tue, Nov 19, 2013 at 10:10 AM, lilinx notifications@github.com wrote:

In progress.

The basic idea here is to generate a random text in a constructed language, then ask humans to translate it in a natural language and thus, bringing meaning into it.

I'm now experimenting with wiki hosting sites, to find the right place where I could host such a thing. I created these two wikis where login is not required to contribute :

Oddwiki : http://www.oddwiki.org/odd/OddList/sitelenpinimimute Wikia : http://sitelen-pi-nimi-mute.wikia.com/wiki/Wiki_Content

The idea is first, to generate a 50k words Markov-chain in Toki Ponahttp://en.wikipedia.org/wiki/Toki_Pona .

Toki Pona is an extremely fun constructed language that works with exactly 123 words. You can learn the basics in minutes.

Toki Pona Markov chain is very syntax-consistent. Because of the very flexible syntax of toki pona, most of the words belong to all the part-of-speech at the same time It would also be very easy to design a parsing script that would erase the few uncorrect sentences in the text. But I also like the idea of having few mistakes in the novel.

. Because of the restrictive vocabulary, any sentence written in Toki Pona can have multiple interpretations.

The idea is to generate the book, possibly make a syntax check on it, then publish it online and ask everybody (I mean...the international toki pona speakers community...) to contribute to its apophenic "translation".

People could contribute over the same interpretations, writing all together a consistent novel, or on the opposite they could fight over the meaning of the generated text and we could see complete different stories emerge from the same original source.

Just wondering if anybody has suggestions on how to organize this.

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/73 .

lilinx commented 10 years ago

Do you mean lojban is also easy to generate?

The idea here is more to have humans manually do the translation job (once the source is generated), but I have also been thinking about automated translation in Toki Pona.

I guess part of speech would easily be identified because they mostly depend on the position of the word in the sentence. The problem is the meaning of the words, you would still be stuck with few hundreds of possible translated words. Or you could attribute several translations to each word and pick them up randomly each time the word comes up.

Or maybe a Markov-translation system where the interpretation for one word would be chosen on a statistical basis (depending on the previous translations chosen).

This is all way beyond my skills and objectives at the moment.

enkiv2 commented 10 years ago

Lojban isn't easy to generate with a markov model, but it's almost one to one with first order predicate logic, which means that you can trivially generate it from an ontology. Lojban is pretty much the anti- Toki Pona: it's very precise, and has the capacity to be almost entirely unambiguous (in addition to having the capacity to specify how ambiguous the speaker wishes to be with a granularity of one in sixty-five thousand or something).

As for an automated Toki Pona translator, I was thinking of assuming that any sequence of words would have exactly the same sequence of parts of speech, and then (for each word) having a finite set of static translations for each part of speech it can be interpreted as. If you then chunk words into fixed-length sentences (regardless of the original chunking), you can have the same sentence be interpreted completely differently later on because of alignment errors.

Looking through the wikipedia page for Toki Pona, though, it looks like it has more of a grammar than I thought (it's chomsky-complete with subclauses!), which means simple alignment changes may not work.

On Tue, Nov 19, 2013 at 10:32 AM, lilinx notifications@github.com wrote:

Do you mean lojban is also easy to generate?

The idea here is more to have humans manually do the translation job (once the source is generated), but I have also been thinking about automated translation in Toki Pona.

I guess part of speech would easily be identified because they mostly depend on the position of the word in the sentence. The problem is the meaning of the words, you would still be stuck with few hundreds of possible translated words. Or you could attribute several translations to each word and pick them up randomly each time the word comes up.

Or maybe a Markov-translation system where the interpretation for one word would be chosen on a statistical basis (depending on the previous translations chosen).

This is all way beyond my skills and objectives at the moment.

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/73#issuecomment-28799291 .

enkiv2 commented 10 years ago

I should actually clarify: you can generate grammatically incorrect lojban with a markov model, and not all automated translators will enforce all grammatical rules.

On Tue, Nov 19, 2013 at 10:41 AM, John Ohno john.ohno@gmail.com wrote:

Lojban isn't easy to generate with a markov model, but it's almost one to one with first order predicate logic, which means that you can trivially generate it from an ontology. Lojban is pretty much the anti- Toki Pona: it's very precise, and has the capacity to be almost entirely unambiguous (in addition to having the capacity to specify how ambiguous the speaker wishes to be with a granularity of one in sixty-five thousand or something).

As for an automated Toki Pona translator, I was thinking of assuming that any sequence of words would have exactly the same sequence of parts of speech, and then (for each word) having a finite set of static translations for each part of speech it can be interpreted as. If you then chunk words into fixed-length sentences (regardless of the original chunking), you can have the same sentence be interpreted completely differently later on because of alignment errors.

Looking through the wikipedia page for Toki Pona, though, it looks like it has more of a grammar than I thought (it's chomsky-complete with subclauses!), which means simple alignment changes may not work.

On Tue, Nov 19, 2013 at 10:32 AM, lilinx notifications@github.com wrote:

Do you mean lojban is also easy to generate?

The idea here is more to have humans manually do the translation job (once the source is generated), but I have also been thinking about automated translation in Toki Pona.

I guess part of speech would easily be identified because they mostly depend on the position of the word in the sentence. The problem is the meaning of the words, you would still be stuck with few hundreds of possible translated words. Or you could attribute several translations to each word and pick them up randomly each time the word comes up.

Or maybe a Markov-translation system where the interpretation for one word would be chosen on a statistical basis (depending on the previous translations chosen).

This is all way beyond my skills and objectives at the moment.

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/73#issuecomment-28799291 .

lilinx commented 10 years ago

Lojban sounds serious. Today is not the first time I try to understand what it is and fail. Toki Pona is extremely ambiguous. It's also what I like about the language so I like to often surexploit ambiguity.

Update on the project : Google Spreadsheet happens to have a "everybody can comment" option. This seems to be the easiest way to make an editable translation document that everybody could contribute to.

lilinx commented 10 years ago

This would be the thing : https://docs.google.com/spreadsheet/ccc?key=0AhYMrGFh0ECjdHNEYTlZZ2s0cU9PTHJDbVpjRVNXVnc&usp=sharing

enkiv2 commented 10 years ago

Do you want english translation in a second column?

On Tue, Nov 19, 2013 at 2:40 PM, lilinx notifications@github.com wrote:

This would be the thing :

https://docs.google.com/spreadsheet/ccc?key=0AhYMrGFh0ECjdHNEYTlZZ2s0cU9PTHJDbVpjRVNXVnc&usp=sharing

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/73#issuecomment-28825120 .

lilinx commented 10 years ago

I think I would rather allow people to comment the cells by inserting comments.This way they could discuss tiny parts of the text, disagree or agree etc. Then when there are consistent passages translated (to english or any other language), I could display them myself in a second column. I already started to submit a translation of the beginning myself. The way it slowly starts to make sense and everyhing happens to "fit in" is fascinating. Or maybe it's just turning me crazy.

enkiv2 commented 10 years ago

When you begin to turn crazy, everything becomes fascinating :-)

(Reading markov output until it starts to make sense is a very good way to exit consensus reality entirely. I've done it too many times!)

On Tue, Nov 19, 2013 at 3:10 PM, lilinx notifications@github.com wrote:

I think I would rather allow people to comment the cells by inserting comments.This way they could discuss tiny parts of the text, disagree or agree etc. Then when there are consistent passages translated (to english or any other language), I could display them myself in a second column. I already started to submit a translation of the beginning myself. The way it slowly starts to make sense and everyhing happens to "fit in" is fascinating. Or maybe it's just turning me crazy.

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/73#issuecomment-28828060 .

lilinx commented 10 years ago

Sample of a very rough translation I did, the prologue of the book (I assume the prologue of the book depicts, in a highly poetic style, Columbus on the deck of the Española during the atlantic crossing. He's vaguely dreaming about destiny while details of the life on board crosses his mind. He may actually be sitting on his cabin and writing his captain's log).

"We keep our eyes on it : the glorious way. A long, long time. Riches that made Corporations and Houses powerful, have lost their value. Why are my sailors singing their song "Do it!" and shouting all over the place? Allright, then! Only Muwama make such tiny moves, and eat so much. Meanwhile, awful birds... The wind is not blowing. Alone there, where won't he [or the winter we're waiting for] spread our message? Columbus! The one that left. Sosi Ewasimu. So many terrestrial globes! The men are hoisting the sails. Papanko is dead. You know it : the ocean is good and quiet. No one refuses the protection of his creek, and no one masters the ocean. You : Solomon."

This is on stand-by. Toki Pona community seems moderately interested in this project. I guess I should introduce it in a nicer way. Also, I finally think it would be better to parse out all syntaxically incorrect sentences from the book. This together with Markov chains would generate a completely syntaxically acceptable (and potentially meaningful) 50k words novel. I will have to code a syntax parser. I don't think it will be difficult but I'm not sure I'll do it in November.

Zireael07 commented 2 years ago

What happened to the requirement that the nanogenmo projects have publicly accessible source code?

enkiv2 commented 2 years ago

TBH nobody enforces anything because this is all just for fun? (I'm not sure what the point of doing this without releasing code is, but some people don't want to, or delay it...)

On Wed, Nov 17, 2021 at 2:40 PM Zireael07 @.***> wrote:

What happened to the requirement that the nanogenmo projects have publicly accessible source code?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dariusk/NaNoGenMo/issues/73#issuecomment-971912472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXUGLVBP4FSTTNXISXVCTUMQAL5ANCNFSM4AJZQVNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

hugovk commented 2 years ago

Check the labels, this was "in progress" and was never completed.