Time having run out for my other grander ideas, I am reduced to (once again) taking Jane Austen's great work and injecting puerile humour into it.
This time I attempted to see if I could find words containing innuendo, generally of the sexual variety, and italicise them in a nudge-wink kind of way. After experimenting with a few ways of obtaining the words (chiefly using sense2vec to find words used in similar context to actual swear words), I settled on searching Urban Dictionary for words whose primary (meaning most upvoted, I think) dictionary entry contained the word 'sex'. In addition, I replaced some perfectly innocent words with grawlixes for giggles.
IT is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a &@$%.
"It will be no use to us if twenty such should come, since you will not %&@# them."
"Depend upon it, my dear, that when there are twenty I will %@#& them all."
"Indeed, Sir, I have not the least intention of &$@ing. — I entreat you not to suppose that I moved this way in order to beg for a partner*."
He was most highly esteemed by Mr. Darcy, a most intimate, confidential friend.
I do not pretend to regret any thing I shall leave in Hertfordshire, except your society, my dearest friend; but we will hope at some future period, to enjoy many returns of the delightful intercourse we have known...
Tools used/lessons learned
@#&% is called a grawlix.
SpaCy (Python)
Chiefly for part-of-speech tagging and (very little) dependency parsing.
Its token.text_with_ws function is especially useful for maintaining good spacing.
There's still room for a Python library to do intelligent text replacement (e.g. handling a/an, conjugation, plurals, phrasal verbs, etc) though.
Does word2vec on (word, part-of-speech) combinations.
Trained on Reddit comments, which I was hoping would know swear words well.
Still very hard to triangulate words with multiple meanings like ball, which wasn't close to dance and a bunch of other likely words I tried. Further word sense disambiguation would still be useful.
Identifying words with innuendo is really hard and people are doing actual research on this.
Expanding the list of words searched for beyond 'sex' would be a good next step.
Maybe training a classifier on urban dictionary entries would work even better, incorporating other information like whether a word is used in sex-related subreddits versus other subreddits...
Time having run out for my other grander ideas, I am reduced to (once again) taking Jane Austen's great work and injecting puerile humour into it.
This time I attempted to see if I could find words containing innuendo, generally of the sexual variety, and italicise them in a nudge-wink kind of way. After experimenting with a few ways of obtaining the words (chiefly using sense2vec to find words used in similar context to actual swear words), I settled on searching Urban Dictionary for words whose primary (meaning most upvoted, I think) dictionary entry contained the word 'sex'. In addition, I replaced some perfectly innocent words with grawlixes for giggles.
Complete novel
Sample output:
Tools used/lessons learned
token.text_with_ws
function is especially useful for maintaining good spacing.