dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

SPIKE: Standardize architectural patterns #163

Closed MBJean closed 3 years ago

MBJean commented 3 years ago

Overview

This PR does three things:

  1. Follows up on PR #151 (the separation of all corpus and document functionality into its own package) by reorganizing all corpus_analysis and gender_analysis modules into a more conventional Python architecture. This means all packages are now included in a single base directory (gender_analysis), I've renamed a few of our packages to follow the Python styleguide recommendations (link), and common testing files (including common variables and text files) have been moved into their own directory (testing) while package-specific testing has remained with their associated packages. The Gender Analysis Toolkit therefore will consist of three functional packages: text, gender, and analysis, along with a testing package.
  2. Updates our __init__.py files to (TBD).
  3. Deletes gender_adjectives.py. This module has been fully replaced by the proximity module (PR #159).

One upshot of this change is that we are now linting all of our modules correctly in GitHub. Previously, because corpus_analysis was not included in the gender_analysis directory, it was not being linted in our GitHub actions. This change resolved that, and identified a number of linting changes that could be made. I made some and skipped others as seemed most relevant.

Directory structure before

Screen Shot 2021-05-17 at 1 30 26 PM

Directory structure after

Screen Shot 2021-05-17 at 1 44 32 PM