Open ebeshero opened 6 years ago
Question: Does Mitford writing prose sound "more like" Jane Austen, or to herself when she writes plays? And/or to Byron when he writes plays?
Think about structural characteristics from the markup (markup data) that might be helpful for stylometry. (This is something Patrick's curious to know...) (the ontological categories are more important than the hierarchy)
string-length() number of words per sentence (sentences determined by end-stop punctuation followed by white space)
These quantitative metrics aren't really great distinguishers.
Use of function words (= words whose meaning is defined by context)
stop words = words that are so common that processing them doesn't help
Sometimes Stylometrists filter out everything except stop words, because these show the most distinctiveness
Plays
Possible questions for stylometry: 1) Role of Actor-Managers in altering plays: This needs just performed versions Maybe: List of Macready-only variants (based on difference from variants) What evidence do we see of distinct voices in the play documents?
Training set of files:
Can we get 3 plays by two directors And an "Unknown" testing set (but really an outsider that we Know, so we know what the right answer) (Maybe the director on a different author?)
Processing:
1) entirely plain text of the just the play (no metadata or cast list) 2) structural markup only (stage directions, acts, scenes, actors, and speeches) 3) Data pulled from structural markup:
@juola