Open toomuchpete opened 8 years ago
As requested, @kyfast.
Love it.
:pineapple:
There's been dialogue swapping in the past, and I did character/noun swapping between two texts as well. But nobody has tackled the problem of getting references straight. I thought about it as one of my projects this year, but don't know if I'll get to it.
I won't be sad if you do the work for the rest of us!
The word2vec-related projects have managed to translate references. If you make an explicit list of proper names in each source, you can probably make an explicit translation or use word2vec to produce correspondences for you.
On Thu, Nov 5, 2015 at 9:54 AM Michael Paulukonis notifications@github.com wrote:
There's been dialogue swapping in the past, and I did character/noun swapping between two texts as well. But nobody has tackled the problem of getting references straight. I thought about it as one of my projects this year, but don't know if I'll get to it.
I won't be sad if you do the work for the rest of us!
— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/133#issuecomment-154083202 .
I would be intrigued to see this work; one problem is eponyms, nicknames, gender-references, and titles. "King" posed a particular problem for me, as the pos-tagger I was using always decided it was a verb. @enkiv2 - can you link to one or more projects that managed to translate references?
Take a look at the translated titles and authors in https://github.com/dariusk/NaNoGenMo-2015/issues/72 ; this is what I mean. Word2vec correctly figured out that certain proper nouns were similar in the same way that it figured out that certain nouns are similar in general, from what I understand. If you whitelist proper nouns and have an explicit list of identical ways of referring to the same person which you normalize, you can do that with better reliability, but at that point you've done most of the work of creating a correspondence table between sets of characters and you might as well just do string replacement on them.
On Thu, Nov 5, 2015 at 10:46 AM Michael Paulukonis notifications@github.com wrote:
I would be intrigued to see this work; one problem is eponyms, nicknames, gender-references, and titles. "King" posed a particular problem for me, as the pos-tagger I was using always decided it was a verb. @enkiv2 https://github.com/enkiv2 - can you link to one or more projects that managed to translate references?
— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/133#issuecomment-154097571 .
My Gutenberg Shuffle from 2013 attempted to respect references, but it turned out to be a bigger project than anticipwords.It sort of got gender right, though I'd redo it if I went that way again.
Note that, at least for the libraries in gensim, pos-taggers work better on sentences rather than individual words.
I was thinking you'd operate on the whole sentences, but then only pay attention to the whitelisted words.
On Thu, Nov 5, 2015 at 9:28 PM Isaac Karth notifications@github.com wrote:
My Gutenberg Shuffle from 2013 attempted to respect references, but it turned out to be a bigger project than anticipwords.It sort of got gender right, though I'd redo it if I went that way again.
Note that, at least for the libraries in gensim, pos-taggers work better on sentences rather than individual words.
— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/133#issuecomment-154264928 .
It sounds like what you'd need (if you did choose to somehow "translate" the names) is to have the names in the Spongebob corpus tagged for named entities, but in case it's useful to have a version of P&P that is name-tagged, the P&P e-text at Pemberley.com is conveniently so.
<P>``<A HREF="ppdrmtis.html#MrBennet">Mr. Bennet</A>, how can you abuse your own children in such way? You take delight in vexing me. You have no compassion on my poor nerves.''</P>
My goal is to process a book from Project Gutenberg's Top 100 list, possibly Pride and Prejudice. The book will remain largely intact, but the quotes will be replaced with quotes generated from corpus compiled from Spongebob Squarepants fanfic (collected from FanFiction.net).
Probably the most jarring thing to solve is getting the names right. It would be disorienting to see Spongebob's name littered around Pride and Prejudice, but maybe that will be funny? Or there's probably some replacement that can be done, translating character names between the two.