NaNoGenMo / 2024

National Novel Generation Month, 2024 edition.
10 stars 1 forks source link

Weaving word alignments #11

Open amandakann opened 2 weeks ago

amandakann commented 2 weeks ago

in my research, i work a lot with word alignments – links between the index of a word in one sentence, and the index of the rquivalent word in a translation of the same sentence.

勇気 は どこ に ?きみ の 胸 に ! 2 1 0 -1 3 5 5 6 4 7 where is courage ? in your heart !

these alignments are meant to link two translations of the same sentence together, but could just as well be pointed to a different sentence entirely.

for nanogenmo 2024, i want to write a little program that traverses a corpus of aligned sentences, starting at the first word of the first sentence and following its alignment to the next sentence in the corpus, weaving fragments of each sentence into a connected string of words. since all aligned corpora are (at least) bilingual, two weaves can be made at the same time – woven in different languages, but formed by fragments from the same sentences in the same order – starting from the same point, diverging quicker the more different the two languages are, sometimes briefly woven back together by chance. the resulting weaves will end up being mostly ungrammatical and incomprehensible on their own, but maybe more meaningful together.