NaNoGenMo / 2023

National Novel Generation Month, 2023 edition.
27 stars 2 forks source link

The Backrooms: A Novel #11

Open umm-maybe opened 10 months ago

umm-maybe commented 10 months ago

I've been toying with the idea of a Backrooms/SCP bot (possibly with some cosmic and Gothic horror elements mixed in) for a while, but it actually seems like this would be a more appropriate venue. Respecting copyright and the wishes of original authors has been central to my data collection efforts. Curious to see if I can re-purpose a code LLM trained on open-licensed data for this project.

umm-maybe commented 10 months ago

I took the (unusual, for generative AI) step of reaching out to some authors of articles on the Backrooms wiki(s) to ask if they would consent to my using their work in fine-tuning an open-licensed LLM. While there were a couple of authors on the Backrooms discord who were fine with this, the only responses I got from Wikidot authors were negative. So, even though all those data are CC-licensed and this non-commercial endeavor likely falls within fair use anyways, I'm inclined to pivot away from just training a machine learning model on a wiki-based corpus.

What I have in mind is actually, to me, a little more exciting. I've modified the parser I originally wrote to prepare LLM training data such that it now catalogs all of the entrances and exits to each of the many "Levels" of the Backrooms that have been described on the Fandom wiki. This data structure constructs, in effect, a network graph; an entire world that could be traversed by a narrative-building agent. I may still use an LLM, but the way I would use the Wiki content would be different; the description of the current Level would go into the prompt as context for an instruction like "write a chapter in which our protagonist, {description_of_protagonist}, explores {current_level}, making their way to {randomly_chosen_exit}." So basically, it's making up the story of a random walk through the Backrooms, taken by some unfortunate character who has accidentally no-clipped out of reality. Since that could happen to anybody, the character itself could be randomly generated, and/or multiple characters could co-exist and simultaneously wander the Backrooms, possibly bumping into one another and interacting (or not, since there are so many levels, and the levels themselves are so maze-like that they likely wouldn't ever meet one another unless they entered together).

If I'm trying to be as copyright-respecting as possible, I'll need an LLM trained on open-licensed data that can generate stories. I'm a fan of StarCoder, but for this application a more general model like Phi 1.5 (or the newly released 2, if it still omits CommonCrawl) might be more appropriate. If the quality of the output is low it may need to be fine-tuned on some repository of short stories; this is where I'm thinking some cosmic horror / gothic fiction from Project Gutenberg and/or WikiSource might come in handy.

umm-maybe commented 9 months ago

Better late than never; the code is still running, because I made the weird choice to run LLM inference locally. But here it is: https://github.com/umm-maybe/backrooms-chronicles

Will share the "novel" as soon as it's done.

umm-maybe commented 9 months ago

Finally finished! The output is here: https://github.com/umm-maybe/backrooms-chronicles/blob/main/the_backrooms_chronicles.md

I had to touch up something that was the result of a bug I fixed after the script was well underway, and didn't want to start over--hopefully that's not breaking the rules. Actually, I'd like to continue pecking away at this artifact, working on it as human editor and illustrator. Hopefully that's not breaking the rules either, but then again it's no longer November, so who cares!

I've also drafted a Medium article about the experiences that led to this project and the reasoning behind my approach. It is unlisted for now; I would appreciate feedback before I touch it up and publish sometime in the coming week: https://medium.com/@matthewmaybe/data-dignity-is-difficult-64ba41ee9150