NaNoGenMo / 2016

National Novel Generation Month, 2016 edition.
https://nanogenmo.github.io
162 stars 7 forks source link

Word vector text inflector #126

Open mbwolff opened 7 years ago

mbwolff commented 7 years ago

Using gensim to build a word2Vec model based on over 1300 French texts from the nineteenth century, I am writing code that takes a pair of words (e.g. "homme" and "femme") and a text (Le Père Goriot, by Balzac) as parameters and generates an "modulated" text. Each word in the original text is replaced by a word that is "most similar" to it according to the word pair. For instance, if "roi" is a word in the original text, it would be replaced thusly:

>>> model.most_similar(positive=['femme', 'roi'], negative=['homme'], topn=1)
[(u'reine', 0.8085041046142578)]

Handling verb conjugations and adjective agreements in French is tricky but I aim to produce a mostly readable text. The code will hopefully be able to "modulate" any text in French against any pair of words.

mbwolff commented 7 years ago

And it's more or less done! Here's the repository with the input text, code, vector data and output. The generated novel is Madame Bovary Modulée, based on Flaubert's famous text.