Closed roedoejet closed 2 months ago
Review changes with SemanticDiff.
Analyzed 6 of 7 files.
Overall, the semantic diff is 6% smaller than the GitHub diff.
Filename | Status | |
---|---|---|
:heavy_check_mark: | everyvoice/wizard/dataset.py | 2.26% smaller |
:heavy_check_mark: | everyvoice/utils/__init__.py | 0.0% smaller |
:heavy_check_mark: | everyvoice/tests/test_text.py | Analyzed |
:heavy_check_mark: | everyvoice/tests/test_wizard.py | 2.92% smaller |
:grey_question: | everyvoice/tests/data/unit-test-case1.psv | Unsupported file format |
:heavy_check_mark: | everyvoice/model/e2e/config/__init__.py | 12.5% smaller |
:heavy_check_mark: | everyvoice/config/text_config.py | 92.33% smaller |
CLI load time: 0:00.23
Pull Request HEAD: 30fde0c53efb793567ab7b5af7d6f51b1e46c976
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 74.54%. Comparing base (
9004aad
) to head (73a03ec
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Is it an expected behaviour that, if no text normalization is applied, the character list remains duplicates of empty symbols? For example, creating a project from
/sgile/data/MohawkCorpus/am_corpus
without text normalization results inmoh_characters: ['', '', (, ), '0', '1', '3', '8', a, c, d, e, g, h, i, k, l, m, n, o, p, r, s, t, u, v, w, x, y, z, '', à, á, è, é, ì, í, ò, ó]
where you can find three instances of ''.
I also noticed that changes from #515 are highlighted as changes of this PR as well.
great find @wiitt - no, this is not intended behaviour. It was an unnecessary if response:
condition that was preventing it from applying. I've fixed this and added a test to catch this in the future.
PR Goal?
Whitespace collapsing wasn't being applied by the wizard, only the chosen cleaners. This changes that.
Fixes?
Feedback sought?
sanity, code check
Priority?
medium
Tests added?
✅
How to test?
Confidence?
medium-high
Version change?
no
Related PRs?
515 and #516