Open rchrdlln opened 5 years ago
Repository: https://github.com/rchrdlln/nanogenmo
Novel: https://github.com/rchrdlln/nanogenmo/blob/master/Microsoft%20Word%20-%20IS_IT_LOVE.docx.pdf
Google Books search results pages usually include a sentence or two on either side of the complete sentence containing the search term. I chose some terms I hoped might add up to an interesting non-narrative and manually downloaded 5 or 10 search results pages per search term into separate folders. First script scrapes the pages to a huge text file. Second script goes through it all, pulls out what appear to be viable sentences and clauses, then reassembles them and cleans the text up a little bit, but probably not nearly enough. I tried to avoid sucking in book blurbs as source material, without much luck.
No AI, no ML, no NLP. 68,421 words.
Hey, I forgot to officially sign up for this but I did generate a "novel" in November and I'd like to post the source code and finished work etc. - entitled "IS IT LOVE" (or so it titled itself) today if possible.