Open pfctdayelise opened 10 years ago
Great idea!
I've thought about doing this too (for the language learning aspect, not the generating a novel aspect - which would probably require translating bigger chunks and some syntax eventually) and I think I've seen someone mention doing this on the language learning subreddit, but I don't know if anyone's actually generated a whole book like this. Looking forward to seeing what you come up with!
A couple of resources that may be of use to you, though perhaps you know about them already:
http://opus.lingfil.uu.se/ - Open Parallel Text corpus for several languages http://andreeaaussi.wordpress.com/2013/03/04/how-to-do-word-alignment-with-giza-from-parallel-corpora/ - Giza++ is open source software that can align parallel texts
Might help to resolve ambiguities like homonyms, unless of course you want to make it funny :)
this is a good idea, please do it
Any updates?
Well, I haven't done anything on it :) Anyone is welcome to take up the idea if they like.
Take an existing novel of 50k words. About 200 pages. In the first page, change one English word for a Spanish (or whatever target language) word. All subsequent instances of that word will also be changed. On the next page, change one or two more. So by the final page, it should be mostly the target language.
Maybe this idea is too much "for humans" than "for machines", but since learning about NaNoGenMo I have been thinking about what it means to generate/write a novel in particular, rather than a tweet or a paragraph. How can you actually have something that a reader would be motivated to read until the end, rather than be amused for 2 pages and then drop? ...Well I have no idea, so something with proven success seems like an easy place to start.
Substituting only nouns will be by far the easiest place to start... I think I like the idea of something that is as much as possible grammatically and linguistically correct, but pragmatically pretty useless (like you learn the word for "chambermaid" but no verbs). POS tagged text would be a good help too. Also, word substitution should be done so that the most frequent words are substituted first.
I should probably check on Amazon to see if someone has already dreamed this up.