Markov text with citations

serin-delaunay commented 4 years ago

A common criticism of GPT language models is that they plagiarise text from the internet. As an experiment in smoothing over this issue, I will make a Markov chain language model that tags each n-gram observation with the location of the original in the source text.

This means that in the text generation stage, each output token can cite the n-gram it was drawn from in the source text. In the generated novel, I'll put this info in footnotes. This should make the resulting text much better sourced, and give the reader clarity about the true origin of any deep insights found in the novel.

Haven't decided what source text to use. Maybe Shakespeare (all lines have a standard identifier), GPT research papers, Moby Dick...

Caveats:

I'll probably need to generate LaTeX to keep the footnotes organised.
The procedure would be difficult to port into GPT models.
Most of the 50,000 words would be in the footnotes.

serin-delaunay commented 4 years ago

If there's time I might also do a slightly more serious separate entry that doesn't boil down to "YAMC".

pjfpotter commented 4 years ago

Why not write an entire novel of footnotes? Each footnote is a citation of the n-gram that would have been in the novel but then wasn't because it was replaced by it's own citation. Let's see how deep this rabbit hole goes.

serin-delaunay commented 4 years ago

There's one like that at https://github.com/NaNoGenMo/2019/issues/68; I'd rather keep this one simple. The footnotes will have a pretty well-defined format, so they wouldn't need to be Markov-generated or nested.

greg-kennedy commented 4 years ago

This is the one that comes to mind when I think of obsessive footnotes: https://github.com/NaNoGenMo/2019/issues/127

serin-delaunay commented 4 years ago

Yeah, that's closer to what I'm going for here. Thanks for the link, I saw that one last year but it had slipped my mind.

verachell commented 4 years ago

What a cool idea!

NaNoGenMo / 2020

Markov text with citations #18