NaNoGenMo / 2019

National Novel Generation Month, 2019 edition.
97 stars 5 forks source link

Color Visions #57

Open dx9240 opened 4 years ago

dx9240 commented 4 years ago

My aim is to generate a fortunetelling log or a journal of prophecies, partly using the Stanford Natural Language Inference (SNLI) corpus. The corpus contains photo captions which describe a scene. There are used as premise sentences and are grouped with additional neutral, entailing, or contradicting sentences:

"A person on a horse jumps over a broken down airplane. A person is training his horse for a competition." "A person on a horse jumps over a broken down airplane. A person is at a diner, ordering an omelette." "A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse."

https://nlp.stanford.edu/projects/snli/

From the sentences I've seen, they read as if the speaker is describing a snapshot of a scene before their eyes.

Inspiration: I've recently just finished watching 'Good Omens' in which a witch's prophecies are written on hundreds of index cards which end up being dropped and mixed up at some point in the series. I've also been wondering how the astrology industry might be using Big Data and ML to generate or recycle content. Further, something I learnt recently which is interesting is that the 1735 Witchcraft Act was replaced by the Fraudulent Mediums Acts, which was again more recently replaced by consumer protection regulations. So for this project, I'd like to combine cliche fortunetelling themes with consumerism and entertainment. I'm opting for a journal structure in order to make the 50,000 word text appear somewhat coherent, although I'd love to be able tie in an overarching, simple 'story' woven into it, perhaps by using reoccurring customer names. Maybe the journal should also include session prices and customer reviews based on sentiment analysis.

theairdemon commented 4 years ago

This looks really interesting! I like the idea of having an overarching story to these prophecies via reoccuring characters, I think that's a good way of tying together disparate threads.

Also, nitpicky note here, I think you mean "Good Omens", not "Bad Omens", lol :) I definitely think that's good inspiration for this project. Agnes Nutter would be proud!

dx9240 commented 4 years ago

Derp. Thanks for pointing that out. At least it's better than the time I said "The Matrix Strikes Back".

dx9240 commented 4 years ago

The sentences in SNLI usually have a generic reference to a person, such as 'the woman' or 'a boy'. So I'm playing around with replacing some of these with names from the Tarot major arcana. The grammar needs fixing of course but the results seem amusing so far. I'm just using the neutral sentence pairs right now, and while I mean to split each pair into two different predictions, I plan on keeping them spatially close to each other in the final text in order to increase the chance of the reader feeling that they might be related.

['Half', 'of', 'the', 'yoga', 'group', 'are', 'The High Priestess', '.'] ['The', 'The Lovers', 'is', 'getting', 'dirty', 'as', 'he', 'plays', 'in', 'the', 'muddy', 'puddles', '.'] ['A', 'Justice', 'in', 'overalls', 'is', 'playing', 'with', 'bubbles', 'outside', '.'] ['Two', 'The Tower', 'are', 'embracing', 'while', 'holding', 'to', 'go', 'packages', '.'] ['A', 'The Hermit', 'is', 'giving', 'a', 'presentation', 'in', 'front', 'of', 'a', 'large', 'crowd', '.'] ['A', 'The Fool', 'and', 'his', 'friend', 'are', 'running', 'a', 'marathon', 'together', '.'] ['The', 'The World', 'is', 'spelling', 'out', 'her', 'favorite', 'words', 'with', 'colored', 'letters', '.'] ['A', 'Temperance', 'is', 'gaining', 'momentum', 'to', 'flip', 'off', 'the', 'swing'] ['An', 'old', 'middle', 'eastern', 'Death', 'is', 'selling', 'corn-on-the-cob', 'from', 'his', 'brother', "'s", 'cart', '.'] ['A', 'The Devil', 'selling', 'donuts', 'to', 'a', 'customer', 'during', 'a', 'world', 'exhibition', 'event', 'held', 'in', 'the', 'city', 'of', 'Angeles'] ['Two', 'doctors', 'are', 'performing', 'surgery', 'on', 'a', 'The Devil', '.'] ['A', 'The Sun', 'selling', 'donuts', 'to', 'a', 'customer', 'during', 'a', 'world', 'exhibition', 'event', 'while', 'people', 'wait', 'in', 'line', 'behind', 'him', '.'] ['A', 'little', 'The Fool', 'in', 'a', 'blue', 'shirt', 'holding', 'a', 'toy', '.'] ['A', 'The Hierophant', 'swings', 'high', 'in', 'the', 'air', '.'] ['A', 'The Empress', 'in', 'the', 'middle', 'east', 'with', 'a', 'corn-on-the-cob', 'cart', 'selling', 'corn', '.'] ['A', 'The High Priestess', 'and', 'a', 'woman', 'are', 'talking', 'in', 'a', 'park'] ['A', 'The High Priestess', 'and', 'a', 'The Chariot', 'are', 'talking', 'in', 'a', 'park'] ['A', 'The Magician', 'running', 'a', 'marathon', 'talks', 'to', 'his', 'friend', '.'] ['A', 'The Magician', 'uses', 'a', 'projector', 'to', 'give', 'a', 'presentation', '.'] ['An', 'older', 'The High Priestess', 'speaking', 'at', 'a', 'podium', '.'] ['A', 'The Devil', 'in', 'overalls', 'blows', 'bubbles', '.'] ['The', 'little', 'The Lovers', 'is', 'jumping', 'into', 'a', 'puddle', 'on', 'the', 'street', '.']

dx9240 commented 4 years ago

I noticed that there were many colors mentioned in the dataset, so I decided to create chapters by filtering for sentences with colors in them, then grouping together all the sentences which contained the same color. using the SNLI training set, this resulted in about 69,000 words.

I was going to use each sentence as a seed of some sort for generating more content, and I hope to do so in future. When I have time later, I'd love to format the output text to look like it may have come out of an arcade fortune telling machine. Or even better, like the output of a prediction machine built by a mad 1960s AI/computer scientist.

Some of the fortunetelling arcade machines I've seen do model their mannequins after racist caricatures. To break from this, I would like to remove racial references which are present in the dataset in an effort to avoid harmful stereotypes which might be present.

dx9240 commented 4 years ago

Here is the novel, Color Visions: color_visions_tkacz.txt

And the code, although it needs A LOT of cleaning up: color_visions_python3_code.txt