Resources - Githubissues

dariusk commented 7 years ago

This is an open issue where you can comment and add resources that might come in handy for NaNoGenMo.

There are already a ton of resources on the old resources threads for the 2013 edition, the 2014 edition, and the 2015 edition.

ikarth commented 7 years ago

Drawing on last year's projects, there's some things to learn from:

Emily Short's Annals of the Parrigues was an interesting project, but from a Resources perspective her end notes on writing for the generator are full of excellent suggestions for how to write your text and when to use different approaches.

Another project I was impressed with was A Time For Destiny by @cpressey. The "Story Compiler" write-up certainly suggests some avenues for future exploration. (And the discussion around it already inspired @enkiv2 to make a goal-driven planner plot generator.)

The Deserts of the West by @mewo2 has also been a source of inspiration; this National Geographic writeup is particularly illuminating, as are the blog posts about its language generator and map generator.

I could go on, there are a ton of interest projects to learn from. But I also want to emphasize that you don't need to get that fancy: if you're looking for an accessible way to jump into making your first book generator, you might want to check out Tracery. (There's also a Python port, if you need that.)

Also, the Gutenberg Python library is back in active development, should you need texts or metadata from Project Gutenberg.

tra38 commented 7 years ago

I wrote a series of blog posts on dev.to that focuses on NaNoGenMo and text generation. The emphasis of these blog posts is on how to produce "readable" computer-generated text that a human may theoretically like.

Structure in Computer-Generated Novels
- Writing an algorithm to generate human-readable novels can be surprisingly tricky, but structure (in the form of simulations, story compilers, and frame stories), can be used to make the task easier.
Using Templates in Computer-Generated Works
- Telling a computer exactly how to write a story can be a very effective (if simplistic) approach to text generation.
The Commonsense Problem in Computer-Generated Works
- Our algorithms do not share the same world view as us. But certain approaches can be used to ensure that our algorithms can still generate meaningful stories that people like.
Who are the Audiences of Computer-Generated Novels?
- There are five different groups of people that might be able to tolerate--maybe even enjoy--the output of an algorithm.

As a side-note, all of these blog posts are computer-generated as well (and a link to their source code is provided with each blog post) -- though the techniques I used here are hard to scale, since I needed to handwrite the corpus beforehand. Still useful as proofs of concepts.

I also provided links to several NaNoGenMo novels as well, so you could use these blog posts as a reference guide.

ikarth commented 7 years ago

I should probably also mention these libraries:

SpaCy is a library for natural language processing in Python that I've been using instead of NLTK lately.
WaveFunctionCollapse has just become a thing in the past month or so. Original is in C#; there's a Javascript port and will probably have other implementations in the future. Originally intended for tiled images, but there's been promising work with text.

dariusk commented 7 years ago

This sense2vec thing (using SpaCy + word2vec) seems very promising.

enkiv2 commented 7 years ago

Emily Short posted a resource list yesterday on her blog: https://emshort.wordpress.com/2016/10/27/casual-procgen-text-tools/

On Fri, Oct 28, 2016 at 12:41 AM Darius Kazemi notifications@github.com wrote:

This sense2vec https://explosion.ai/blog/sense2vec-with-spacy thing (using SpaCy + word2vec) seems very promising.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2016/issues/1#issuecomment-256834848, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6GcVRZCLWEjSc1dRKcw0JdtTGdZf8ks5q4XzpgaJpZM4KdsXR .

dariusk commented 7 years ago

Ooh, @greg-kennedy just added to Corpora a list of adjectives to describe people. https://github.com/dariusk/corpora/commit/ccf894cfdcaf4894dfb4d77abb2fe9d69cbf77f1

hugovk commented 7 years ago

Created for last year's NaNoGenMo, I've updated this JSON of Project Gutenberg metadata:

https://github.com/hugovk/gutenberg-metadata

ikarth commented 7 years ago

If you need meter or rhyme for your poetic projects, some resources (mostly found after Emily Short asked a question on Twitter and many people replied with suggestions):

A set of word lists, organized by rythmic feet.

The CMU Pronouncing Dictionary is a dictionary of 134K English words and their pronunciations, including stress. NLTK and Allison Parrish's pronouncingpy library provide Python interfaces to it.

poem-gen, a 2014 NaNoGenMo project by Camden Segal may also be of interest. As is NaPoGenMo 2015 and NaPoGenMo 2016.

Some other resources (which may or may not have been mentioned in previous years):

textacy: higher-level NLP built on spaCy: streaming documents, filter linguistic elements, vectorized and semantic network representations, topic models, language identification...

TextBlob is another Python option for processing textual data and NLP.

KoNLPy: Korean NLP in Python

RiTa: JavaScript/Processing/Node NLP tools for computational literature

Pressagio text prediction system: word completions in Python, etc. (A Python port of Presage)

Lexeme: A constructed language word database, generation, and declension program.

Naive Text Summary Tool

moby: Javascript interface for the Moby Thesaurus

superMDguy commented 7 years ago

I have a pattern recognizer I'm working on that will look at text and create 'templates' for phrases, with 'variables' where you can insert names, etc.

Example: "Hello, Bob" -> "Hello, {1}".

Currently it's only able to generate templates given two line of text, but I'm working on expanding it so it can scan an entire corpus and find the best template candidates, and convert them to templates. I'll post it here when I'm done.

enkiv2 commented 7 years ago

I should note that my scene-sequel project from last year has been broken out & generalized so that it can be used as a component in a larger project (say, by having some other piece of code generate the world-model), and so anybody who has an interest in using the fuzzy goal-follower code absolutely should: https://github.com/enkiv2/scene-sequel

On Thu, Nov 3, 2016 at 8:31 AM Matthew D. notifications@github.com wrote:

I have a pattern recognizer I'm working on that will look at text and create 'templates' for phrases, with 'variables' where you can insert names, etc.

Example: "Hello, Bob" -> "Hello, {1}".

Currently it's only able to generate templates given two line of text, but I'm working on expanding it so it can scan an entire corpus and find the best template candidates, and convert them to templates. I'll post it here when I'm done.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2016/issues/1#issuecomment-258129667, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6GaG742W3h5kX_TaJqqeiyof5gF8Fks5q6dQfgaJpZM4KdsXR .

ikarth commented 7 years ago

For those of you working in Java (or the JVM) the Stanford CoreNLP library just released a beta of version 3.7.0. (Interfaces also exist for many other programming languages.)

superMDguy commented 7 years ago

@enkiv2 made ggc (Generative Grammar Compiler). This is useful for writing story templates.

enkiv2 commented 7 years ago

If anybody is working on poetry (or something where meter matters), this list of words grouped by part of speech and syllable count might be useful: http://www.ashley-bovan.co.uk/words/partsofspeech.html

accraze commented 7 years ago

Need character names? Here's a NodeJS module that spits out different names of characters from Infinite Jest by David Foster Wallace: https://github.com/accraze/infinitejest-names

enkiv2 commented 7 years ago

@accraze what is the license on this? It might be a good addition to dariusk/corpora in the names/ or literature/ section.

On Tue, Nov 22, 2016 at 12:29 AM Andy Craze notifications@github.com wrote:

Need character names? Here's a NodeJS module that spits out different names of characters from Infinite Jest by David Foster Wallace: https://github.com/accraze/infinitejest-names

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2016/issues/1#issuecomment-262151339, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6Ga25h0A4HjB6Fv99_xdeiXQ3uVBXks5rAn2igaJpZM4KdsXR .

hugovk commented 7 years ago

From https://github.com/NaNoGenMo/2016/issues/114#issuecomment-264015433:

A set of blog posts about writing Annales:

Annales: the gory details in three parts

Vocabularies: using a neural network, Python and regular expressions to generate a nonsense vocabulary

TextGen: a Haskell combinator library for making up randomised sentences (plus a one-paragraph explanation of how the State monad works!)

Events: in which I get bogged down writing a succession algorithm, but also figure out how to correct a typo in a randomly-generated text

NaNoGenMo / 2016

Resources #1