dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

gutenberg_stripper and a cleaned out corpus #133

Closed kenalba closed 4 years ago

kenalba commented 4 years ago

Updated our corpora to pull out the Gutenberg headers and footers and updated the tests appropriately. I've kept the _gutenberg_cleaner function in the Document class, but we no longer call it every time we initialize a document.

codecov-io commented 4 years ago

Codecov Report

Merging #133 into master will decrease coverage by 0.03%. The diff coverage is 37.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
- Coverage   44.58%   44.54%   -0.04%     
==========================================
  Files          12       12              
  Lines        1615     1623       +8     
  Branches      352      353       +1     
==========================================
+ Hits          720      723       +3     
- Misses        842      847       +5     
  Partials       53       53              
Impacted Files Coverage Δ
gender_analysis/analysis/dunning.py 30.29% <ø> (ø)
gender_analysis/analysis/gender_frequency.py 49.63% <ø> (ø)
gender_analysis/corpus.py 66.66% <ø> (ø)
gender_analysis/document.py 82.14% <37.50%> (-2.24%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cde02c1...f7f03a9. Read the comment docs.