Open kenalba opened 4 years ago
https://www.mygreatlearning.com/blog/named-entity-recognition/. A pretty good overview for Named entity recognition. Microsoft Azure also has NLP modules on this: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/named-entity-recognition For detecting gender by names we could use NLTK or Scik-learn and build our own classifier (so we need to decide what features we'd like: https://www.geeksforgeeks.org/python-gender-identification-by-name-using-nltk/ This is an example of building up a classifier: https://gist.github.com/vinovator/6e5bf1e1bc61687a1e809780c30d6bf6
We could build a method that takes in a Document, detects whether or not it's an epistolary novel, and then breaks the document up into a dictionary of letters (or a list of Letter objects?). We'll want to programmatically detect the writer of each letter and include that in the metadata.
Ideally, we can programmatically determine metadata for each letter - writer, date, recipient, and so on. That's going to be tricky, but maybe possible. If we combine this functionality with our hypothetical named entity recognition module (to get a character list) and a ML-based gender guesser for each character, we can do some classy stuff.