in my research, i work a lot with word alignments – links between the index of a word in one sentence, and the index of the rquivalent word in a translation of the same sentence.
勇気 は どこ に ?きみ の 胸 に !
2 1 0 -1 3 5 5 6 4 7
where is courage ? in your heart !
these alignments are meant to link two translations of the same sentence together, but could just as well be pointed to a different sentence entirely.
for nanogenmo 2024, i want to write a little program that traverses a corpus of aligned sentences, starting at the first word of the first sentence and following its alignment to the next sentence in the corpus, weaving fragments of each sentence into a connected string of words.
since all aligned corpora are (at least) bilingual, two weaves can be made at the same time – woven in different languages, but formed by fragments from the same sentences in the same order – starting from the same point, diverging quicker the more different the two languages are, sometimes briefly woven back together by chance.
the resulting weaves will end up being mostly ungrammatical and incomprehensible on their own, but maybe more meaningful together.
in my research, i work a lot with word alignments – links between the index of a word in one sentence, and the index of the rquivalent word in a translation of the same sentence.
these alignments are meant to link two translations of the same sentence together, but could just as well be pointed to a different sentence entirely.
for nanogenmo 2024, i want to write a little program that traverses a corpus of aligned sentences, starting at the first word of the first sentence and following its alignment to the next sentence in the corpus, weaving fragments of each sentence into a connected string of words. since all aligned corpora are (at least) bilingual, two weaves can be made at the same time – woven in different languages, but formed by fragments from the same sentences in the same order – starting from the same point, diverging quicker the more different the two languages are, sometimes briefly woven back together by chance. the resulting weaves will end up being mostly ungrammatical and incomprehensible on their own, but maybe more meaningful together.