Treating vocabularies as numbering systems, and works composed from them as large numbers, to be manipulated.
Following some very good advice last year I switched focus towards the end of the month to ensuring I actually had 50k words in some kind of format that was readable, rather than bug free code that was pure and true to a half-baked concept that only I was judging on. It was a good exercise in project management: focus on the results that matter.
I was happy enough with the results last year. Some of the bugs / issues with the tokenisation of the source material seemed to make the output more interesting, and my attempts last year to fix it resulted in (if I remember correctly) less interesting output, so I embraced the glitches and accomplished the goal of producing a generated novel using a simple arithmetic operation on a text.
This round I want to:
Generalise the tokenisation to be robust against many kinds of input (I'll be using a mix of properly edited text and some OCR'd source content)
Work on formalising the tokenisation algorithm so it is repeatable / comprehensible
Overcome the challenge of converting a > 100K word text like Pride and Prejudice into an integer. With the current code this requires more than 4 gig of RAM
Work on a shared vocab across more than one source work (4) and do some more interesting averaging or combinations.
Figure out if there is a conceptually pure way to make the text output interesting, or whether the output will really be as interesting as reading a large integer.
This is going to a continuation of my ideas from last year in https://github.com/NaNoGenMo/2019/issues/65
Treating vocabularies as numbering systems, and works composed from them as large numbers, to be manipulated.
Following some very good advice last year I switched focus towards the end of the month to ensuring I actually had 50k words in some kind of format that was readable, rather than bug free code that was pure and true to a half-baked concept that only I was judging on. It was a good exercise in project management: focus on the results that matter.
I was happy enough with the results last year. Some of the bugs / issues with the tokenisation of the source material seemed to make the output more interesting, and my attempts last year to fix it resulted in (if I remember correctly) less interesting output, so I embraced the glitches and accomplished the goal of producing a generated novel using a simple arithmetic operation on a text.
This round I want to: