Closed kenalba closed 4 years ago
Merging #136 into master will decrease coverage by
1.00%
. The diff coverage is32.38%
.
@@ Coverage Diff @@
## master #136 +/- ##
==========================================
- Coverage 44.54% 43.53% -1.01%
==========================================
Files 12 12
Lines 1623 1656 +33
Branches 353 365 +12
==========================================
- Hits 723 721 -2
- Misses 847 882 +35
Partials 53 53
Impacted Files | Coverage Δ | |
---|---|---|
gender_analysis/analysis/dunning.py | 30.29% <0.00%> (ø) |
|
gender_analysis/document.py | 82.14% <ø> (ø) |
|
gender_analysis/analysis/gender_adjective.py | 30.37% <19.23%> (-9.63%) |
:arrow_down: |
gender_analysis/analysis/gender_frequency.py | 49.63% <50.00%> (ø) |
|
gender_analysis/corpus.py | 66.66% <50.00%> (ø) |
|
gender_analysis/gender.py | 96.15% <93.33%> (ø) |
|
gender_analysis/analysis/instance_distance.py | 34.58% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5d9882f...233ed1b. Read the comment docs.
This PR is kind of a mess, so consider this more of a review request than an actual pull request, but I spent some time honing our testing corpora and the way we treat it.
Broadly speaking, I removed the
test_data
folder because every document in there was already insample_novels
. Instead, now, I've created three .csv files -large_test_corpus
,small_test_corpus
, andtiny_test_corpus
- which will only select out the texts we want to use to test from oursample_novels
. This way of thinking about test corpora as metadata-first rather than file-first is, I think, more flexible in the long run. The addition oftiny_test_corpus
also gives us a 4 document corpus to test our most computationally hungry functions on, which is what motivated this shift in the first place.It did mean adding an
ignore_warnings
flag to the corpus generator, since we now expect that the generator won't load every text file in a directory.After doing this, I looked to see what functions are taking a long time to test and, where possible, rewrote them to work on more compact corpora. Doing this cut the time it takes to run coverage on my machine in half.