DigitalMitford / DigMitCS

repository of materials for the Digital Mitford Coding School
https://digitalmitford.github.io/DigMitCS/
GNU Affero General Public License v3.0
3 stars 2 forks source link

Stylometry Workshop Prep #1

Open ebeshero opened 6 years ago

ebeshero commented 6 years ago

Plays

Possible questions for stylometry: 1) Role of Actor-Managers in altering plays: This needs just performed versions Maybe: List of Macready-only variants (based on difference from variants) What evidence do we see of distinct voices in the play documents?

Training set of files:

Can we get 3 plays by two directors And an "Unknown" testing set (but really an outsider that we Know, so we know what the right answer) (Maybe the director on a different author?)

Processing:

1) entirely plain text of the just the play (no metadata or cast list) 2) structural markup only (stage directions, acts, scenes, actors, and speeches) 3) Data pulled from structural markup:

@juola

ebeshero commented 6 years ago

Alternative: (possibly a longer collaborative research project post 25 May)

Question: Does Mitford writing prose sound "more like" Jane Austen, or to herself when she writes plays? And/or to Byron when he writes plays?

Think about structural characteristics from the markup (markup data) that might be helpful for stylometry. (This is something Patrick's curious to know...) (the ontological categories are more important than the hierarchy)

ebeshero commented 6 years ago

methods / parameters:

string-length() number of words per sentence (sentences determined by end-stop punctuation followed by white space)

These quantitative metrics aren't really great distinguishers.

Use of function words (= words whose meaning is defined by context)

stop words = words that are so common that processing them doesn't help

Sometimes Stylometrists filter out everything except stop words, because these show the most distinctiveness