dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Memoization #147

Closed kenalba closed 3 years ago

kenalba commented 3 years ago

The most basic memoization we can use. There's a thornier problem here to tackle between the usage of our ad hoc tokenizer, which strips out punctuation and capitalization, and the nltk tokenizer.

The NLTK tokenizer is a good deal slower, but it's more accurate and we need to use it when we do any kind of pos analysis. I'd ideally like to move us over to a system where we use the NLTK tokenizer for our initial tokenization and maybe store a punctuation-stripped, all-lowercase version of that for Doing Operations on. We might also think about memoizing the POS tagged corpus.

Anyway, this is fast and easy and it doesn't break anything.

... though notably this is branched off of ExpandGenderSupport and therefore should be merged after that.

codecov-io commented 3 years ago

Codecov Report

Merging #147 (b9a5a23) into master (9bafdee) will increase coverage by 5.23%. The diff coverage is 42.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   51.16%   56.40%   +5.23%     
==========================================
  Files          12       12              
  Lines        1675     1468     -207     
  Branches      364      362       -2     
==========================================
- Hits          857      828      -29     
+ Misses        757      586     -171     
+ Partials       61       54       -7     
Impacted Files Coverage Δ
gender_analysis/analysis/dunning.py 30.00% <0.00%> (ø)
gender_analysis/analysis/dependency_parsing.py 12.14% <6.25%> (-1.57%) :arrow_down:
gender_analysis/analysis/instance_distance.py 32.60% <8.69%> (-1.98%) :arrow_down:
gender_analysis/analysis/gender_adjective.py 45.27% <44.89%> (+14.89%) :arrow_up:
gender_analysis/analysis/gender_frequency.py 59.33% <57.46%> (+9.69%) :arrow_up:
gender_analysis/document.py 84.88% <100.00%> (+0.35%) :arrow_up:
gender_analysis/gender.py 100.00% <0.00%> (+2.63%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9bafdee...b9a5a23. Read the comment docs.