Everyone knows that developing software using version control systems is one of the most breathtakingly exciting activities humans ever came up with. 🤩

So why not use git repositories as the base story line for novels, some more epic, some less? Every commit tells a story - stories of features and failures, of linting and loathing, resets and reverts. And where there is absolutely no story in sight the "author" might embellish a couple things here or there...

This is what I will attempt in https://github.com/dubbl/novelopment 😬

The current state after my first NaNoGenMo development evening, when applied to its itself:

dubbl be like setup basic project structure dubbl be like add main.py and parse first repository dubbl be like add apache 2.0 license

What a read of 23 words! Just 49977 more to go!

Dev log 1

After some troubles with getting pycorpora to work the current state is still pretty close to the first one. I have added some more structure to the output, made a Novel and Chapter class, and added the first pseudorandomly picked adjective to the title. To be able to have deterministic output for the same repository (and state) I use the latest commit hash to seed the PRNG and added an optional command line argument to override it.

I also read the Wikipedia page on Natural Language Generation, being pointed there by Ehud Reiter's blogpost How do I Learn about NLG?. I would like to structure my code somewhat following the different stages listed in there.

While looking to debug my packaging issues, I also found that https://github.com/MichaelPaulukonis had the same idea already for the NaNoGenMo 2015! I didn't look too much into it now, to not be influenced, but I will definitely check it out at the end of the month :) At least the preview snipped looked remarkably like my current output, when applied to its own project. :D

The skyrocketed story of novelopment By Novelopment 0.1

All commits dubbl be like setup basic project structure dubbl be like add main.py and parse first repository dubbl be like add apache 2.0 license dubbl be like add more structure to novel output dubbl be like use pycorpora for random adjective in title dubbl be like add README.md dubbl be like add seeding of PRNG

64 words and counting!

Dev log 2

Its been 14 days since my last update, because I didn't write a log last weekend - but that doesn't mean no code was written!

Miner

Last weekend I mainly worked on the "repository miner" and "content determiner". The repository miner basically goes through all commits, categorizes them (by size, whether it's a "fix" commit or not) and tracks who authored and committed them. This part is not really part of the Natural Language Generation phases, because they assume that you already have your data ready.

Content Determination

The next phase, "Content determination" uses the mined data to identify key events (commits), important dates and the protagonists of the story. In my case that are the dates of the first commits, the first time that somebody else contributed to the project, the first person to commit something, etc. Right now the output is just a simple dictionary - I might come up with something better later, but it works for now.

Today I worked on 4 stages further down the pipeline: Document planning, aggregation, lexical choice, and realization (mostly the first and the last one though).

Document planning

Document planning means to transform the determined content into a story that makes sense. My document planning is pretty basic right now: I let the novel start with a chapter on the beginnings of the repository - looking at the important event of the first commit and the first committer. It is here where I first introduce the concept of a Sentence - even though these Sentences don't have to end up as actual sentences in the novel, because of...

Aggregation

Aggregation describes the merging of output that is similarly structured, to make reading it easier and avoid repetitions. For example:

dubbl be like setup basic project structure dubbl be like add main.py and parse first repository

could be become

dubbl be like setup basic project structure and add main.py and parse first repository

Currently my aggregation is in my realizer (probably need to move it), and just checks if the next sentence happens on the same day - in that case it merges the two sentences and removes the (now repeated) time definition from the second sentence.

Lexical choice

Lexical choice is also basic, but existing: I created a dictionary with synonyms for some of the words I use. By defining the synonyms manually I can be reasonably sure that the end-result still makes sense.

Realization

Finally the realizer: Here I use https://github.com/bjascob/pySimpleNLG to convert my self-made sentence structure into actual human readable sentences. This is probably the most hard-core NLG stuff for me that I struggle with a bit, but I'm also (re-)learning some grammatical concepts that I didn't have to think about since primary school. Also, often when I misuse a feature, funny output is generated - like this Gollum-esque piece of art: "Novel starts. Dubbl authorses the first commit."

Current state:

The dimly remarkable story of novelopment

By Novelopment 0.1

Introduction

While you may have been enticed to grab this book because of its title "The dimly remarkable story of novelopment", this is actually the story of 1 developer who came together to build novelopment.

Humble beginnings

The saga started when dubbl authored the first commit in 2022-11-03.

IT IS DONE!

With the last commit at Wed Nov 30 23:58:02 2022 +0100 my NaNoGenMo 2022 is over.

It might be best to tell my story through the Novelopment novel generator - it was made for this after all:

The bitterly excellent story of novelopment

Novelopment 1.0

Introduction

While you may have been enticed to grab this book because of its title "The bitterly excellent story of novelopment", this is actually the story of 1 human building novelopment in 35 commits.

Humble beginnings

The saga started whilst first time contributor dubbl authored a commit with the message "setup basic project structure", a commit with the message "add main.py and parse first repository" and a commit claiming to "add apache 2.0 license" on Thursday, November 3rd 2022. Around 3 days down the road on November 6th 2022 dubbl authored a commit described as "add more structure to novel output", a tiny commit claiming to "use pycorpora for random adjective in title", a tiny commit with the message "add README.md" and a tiny commit claiming to "add seeding of PRNG".

Working on it

On Sunday, November 13th 2022 aforementioned dubbl created a tiny commit called "add handling of title for local repositories", a commit claiming to "add and use black code formatter", a commit claiming to "data mine the repository for events (commits) and their actors", a tiny commit described as "introduce simplenlg to pluralize actor_word", a tiny bug fixing commit with the message "fix linting issues", a commit called "add initial content determiner", a tiny commit called "update readme with new parameters" and a tiny defect fixing commit called "rename src to novelopment, fix logging". About 1 week later on November the 20th 2022 they crafted a commit with the message "add basic document planner and realizer" and a commit claiming to "add very basic sentence aggregation in realizer". More than 1 week later on November 28th 2022 dubbl composed a commit called "start work on aggregator", a commit called "add complementizer handling to realizer" and a commit claiming to "handle multiple complements/objects". The very same one wrote a commit claiming to "add aggregation on time and cue phrase support", a tiny commit claiming to "handle multi-line commit messages" and a tiny commit claiming to "exclude merge commits" on Tuesday, November 29th 2022. Exactly 1 day down the road dubbl adds a commit with the message "add document planning for second committer and the end", a commit described as "start referring expressions generator", a commit called "move get_word to lexicon", a commit called "add entity description generator", a commit with the message "add time expressions", a tiny commit called "conclude the sagas final sentence", a tiny commit described as "detect and describe reverting commits", a commit with the message "updates deps, add jinja2", a commit called "add html rendering", a commit claiming to "add ebooklib dependency" and a commit with the message "add epub export option" on Wednesday, November 30th 2022.

The end (for now)

On November 30th 2022 the previously mentioned dubbl composes a tiny commit with the message "set version to 1.0" and for now the coverage concludes.

As you can see - a lot of activity in the last 3 days! But not quite enough to reach the 50k words threshold... but fear not!

My main repository for testing (when I needed to test on a larger repository than Novelopment itself), was the self-hosted selfoss RSS-Reader - but that story also reaches just ~36k words currently.

As for a 50k words novel, I chose the python web framework pallets/flask. 4000 commits are enough to tell a 57,890 words novel with 359,519 characters on 139 pages. We made it!

Exported as a epub from Novelopment 1.0, and available for download as a PDF, here is "The brightly wondrous story of flask".

📎 The brightly wondrous story of flask.pdf 📜

(Github doesn't allow .epub attachments)

Final dev log

Stardate: End of NaNoGenMo 2022

Since my last dev log I moved the aggregation of similar sentences into it's own module. I am aggregating sentences with the same subject, predicate and time, but different "object", into one. So "X did Y at T. X did Z at T." becomes "X did Y and Z at T". In addition to that it merges sentences that have a close time relation into one sentence and connects both with "when" or "while". The output is not perfect, but it strings actions together.

I was struggling a bit with the document planner: It lays out the structure of the content that we want to convey, based on the content the content determiner deems relevant. Every story has between 4 and 5 chapters, depending on whether it's just one developer or multiple. After the "Introduction" it always starts with "Humble beginnings" for the first commits, followed by "Working on it" and finally "The end (for now)". But if at some point another contributor shows up, a "Two is a crowd" chapter gets prompted. I think that is actually one of the most important and beautiful events in the story of a software project 🥲 This event can happen at any point during the novel of course, so handling that was a bit tricky.

Finally I added the expression generator. It contains 3 steps:

Step 1 is to check if the subject of the current sentence was already the subject of the previous sentence. If that it the case it replaces the subject with a pronoun (like "they" for contributors or "it" for commits).
In the second step it creates descriptive expressions for an entity. E.g. is it a big or small commit, a fix or revert commit, a first time contributor or well-known one. The former is based on the mining of the repo in step one, the latter of course requires to keep count of how often an actor already appear. We don't want to re-introduce the same actor constantly, of course.
The third step is to find descriptions for time, based on the previous event, e.g. "3 days later..." or "After exactly 5 weeks...".

Finally I added the epub exporting option. epublib makes it really easy to create epub files from Python.

That's it! Thanks for this year, it was loads of fun. Tons of ideas that I couldn't implement because of the time constraints, but the constraint is also what makes this so fun of course. See you next year hopefully! sleeeeeeeeeeeeeep 😪

Congratulations!

And oh, I'm in this story!

On October 25th 2017 first time contributor hugovk added a tiny commit with the message "Remove IRC notifications".

I've generated stories for https://github.com/python-pillow/Pillow/: pillow.txt (149k words) and https://github.com/python/cpython: cpython.txt (1.7m words).

The latter begins:

The patiently electrifying story of cpython By Novelopment 1.0

Introduction While you may have been enticed to grab this book because of its title "The patiently electrifying story of cpython", this is actually the story of 2313 contributors coming together to build cpython in 103021 commits. Humble beginnings The book started whilst first time contributor Guido van Rossum added a commit with the message "Initial revision" on August the 9th 1990. More than 4 weeks down the road on September 10th 1990 Guido van Rossum added a tiny commit called "Warning about incompleteness.". More than 1 week later on September the 18th 1990 they themselves authored a tiny commit called "Renamed intro and modules to tut and mod; added tbl to pipeline.".

NaNoGenMo / 2022

Novelopment: A repo's story #24

Dev log 1

Dev log 2

Miner

Content Determination

Document planning

Aggregation

Lexical choice

Realization

Current state:

The dimly remarkable story of novelopment

Introduction

Humble beginnings

The bitterly excellent story of novelopment

Introduction

Humble beginnings

Working on it

The end (for now)

Final dev log