dariusk / NaNoGenMo-2014

National Novel Generation Month, 2014 edition.
257 stars 17 forks source link

Limited Vocabulary Writing #108

Open DHDPIC opened 9 years ago

DHDPIC commented 9 years ago

Please, sir, I want some more... words.

My first attempt at NaNoGenMo! I was working on this idea a short while ago and thought it might fit well with NaNoGenMo so giving it a go.

I was struck by some research presented to me on how many words children of certain ages can understand and say. Their understanding exceeds their ability to say; they can only say a few of the words they know. I am not exact on the data (and each child is different) but if I remember correctly a child of 18 months can understand between 50 - 200 words. As a child gets older this number rapidly grows.

So I thought it would be interesting to see what stories would look like when only a few words can be written or understood and the rest left unintelligible. And this seems like something a computer program could work out pretty easily: establish the most popular words in a story and only render the most popular ones, blanking out the remaining.

I have used Charles Dickins' Oliver Twist as my source text, as I thought the line 'Please, sir, I want some more' was an appropriate hook into the concept, as well as being about a child. I found the text on the Project Gutenberg website, and simply stripped out some unnecessary opening and closing legal text.

I used Processing to ingest and process the text and output the new version as a raw text file. I have outputted different versions for a vocabulary of 50, 200, and 1000 words. I will try to use InDesign to try tand make a version easier on the eye!

I'm new to this so will work out how to upload the outputted text, and the source code soon.

Here is the source code, outputted raw text, and PDFs: https://github.com/DHDPIC/Limited-Vocab-Writing

Any questions or advice please let me know!

Thanks,

David @DHDPIC

ikarth commented 9 years ago

Easiest way to upload a simple text file is as a Gist.

MichaelPaulukonis commented 9 years ago

Processing, oh my!

DHDPIC commented 9 years ago

Thanks ikarth.

MichaelPaulukonis, I hope that is a good 'oh my!'

DHDPIC commented 9 years ago

This is what the raw text looks like:

'Oh, you must not \ about *** ***.'

And I have an excerpt in 50 word vocab and 200 word vocab, linked below: https://gist.github.com/DHDPIC/8b8312d36c1ea0818657 https://gist.github.com/DHDPIC/d5a0401910231c9ae9cf

MichaelPaulukonis commented 9 years ago

When you first mentioned "limited vocabulary" I thought of Dr. Seuss and The Cat in the Hat.

Would it be possible to replace the redacted words with synonyms that are already allowed? Thus "saving" the text, but reducing the vocab?

Alternatively, could you use the unicode character black vertical rectangle? That would be pretty f█████g cool! If not, you can ██ ███ ████████!

DHDPIC commented 9 years ago

Interestingly that is pretty much what I have done in InDesign using GREP styling to treat the * characters differently to the rest of the text. Quick grab below: screen shot 2014-11-21 at 17 23 19

DHDPIC commented 9 years ago

I also wanted to something with learning and repetition, so how many times does a word have to be encountered to learn and commit to memory. Unfortunately I couldn't find any good data on this, so abandoned it. If anyone does have any data on learning time/incidence, then I'd love to know more!

DHDPIC commented 9 years ago

OK just posted my code and the outputted text files to github! Check it out: https://github.com/DHDPIC/Limited-Vocab-Writing

DHDPIC commented 9 years ago

And I have added some PDF versions of the text. Much nicer to look at!

DHDPIC commented 9 years ago

Not sure how/if I add a completed or preview label...

hugovk commented 9 years ago

PDFs look great!

Labels are added by repo owner @dariusk.

DHDPIC commented 9 years ago

Thanks!

hugovk commented 9 years ago

@DHDPIC Labelled!