dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Improve User-Facing Output #173

Open joshfeli opened 3 years ago

joshfeli commented 3 years ago

Our approach focuses on two perspectives to one problem, each of which corresponds with a specific audience/method of interaction with the software:

  1. Interaction with the interface only.

This incudes using the command line, print statements, and perhaps supplying the user with additional visualization methods to display data using plots or directly on the command line.

Here is an example taken from the Quickstart Guide:

>>> analyzer.by_gender(format_by='relative', group_by='label')
{
    'Female': {'subject': 0.1024149702870148, 'object': 0.13351877607788595, 'other': 0.00518396763181186},
    'Male': {'subject': 0.32127955493741306, 'object': 0.11948413200151727, 'other': 0.20091035529144013},
    'Nonbinary': {'subject': 0.0584144645340751, 'object': 0.058793779238841826, 'other': 0.0}
}

This display taken from the documentation is sufficiently neater than how the output is currently displayed on the command line. We could clean up the output such that a user who is unable to interact with the data beyond the command line would still be able to visualize the data.

In addition to potential visualizations (additional commands through command line), we could add clarifying sentences as an additional output on the command line (e.g., “female characters, who tend to be described as asleep or dead, are the subjects of sentences 10% of the time and the objects 13%; male characters, conversely, whose verbs and adjectives are more active, are subjects 32% of the time and objects 12%. They/them occur about evenly across the two.”).

  1. More thorough interaction with the codebase.

From this perspective, the user may be a more experienced developer able to create their own data transformations and use our data structures to create plots as needed.

While these seem like two separate goals, the process of creating built-in visualizations and methods for the first approach would lead to data structures that may be easier for a developer to handle when wanting to create their own data transformations.

With user-facing output being the goal of the issue, much of our work will focus on improving the interface of the package, whether it be taking input from the user (e.g. Y/N input) or adding support for more keyword arguments that dictate the type of visualization and where it is stored on disk.