Closed schwittlick closed 8 years ago
Der ganze Blog ist ziemlich interessant und mal ne halbe Stunde stöbern wert..
Vom FAQ:
Parsing phase:
All poems in html pages are downloaded using SiteSucker. Html is parsed using Beautiful Soup and poems extracted from html into text files.
Analysis phase:
All words in all poems are analyzed using NLTK (Natural Language Toolkit) for POS (part-of-speech).
All poems are sent to an online deep-learning natural language processing API called Alchemy which identifies entities. “Named entities specify things such as persons, places and organizations. AlchemyAPI’s named entity extraction is capable of identifying people, companies, organizations, cities, geographic features and other typed entities”. These entities then form an archive.
All words that are not matched to a synonym in WordNet are put into a ‘reservoir’
Generation phase:
Every entity is replaced with an entity from another poem.
Words that are not entities and not prepositions are replaced with a synset (synonym,homonym,meronym…) using WordNet. If no replacements exist in synset, these words are replaced with a random word from the ‘reservoir’.
Rudimentary correction of verb tenses is done using pattern.en (http://www.clips.ua.ac.be/pages/pattern-en )
Das hier wäre auch nett zu haben: https://mitpress.mit.edu/aesthetic
Ich trainiere damit gerade mal ne Weile nen Model auf unsere neuen, schlecht geparsten Texte. Hab keine große Hoffnung, aber testen kann Mans ja mal. Hier sind nen paar schlechte Ergebnisse in der readme: https://github.com/Zeta36/tensorflow-tex-wavenet
http://bdp.glia.ca/wavenet-for-poem-generation-preliminary-results/