Closed xrotwang closed 6 years ago
Merging #35 into master will decrease coverage by
0.11%
. The diff coverage is96.92%
.
@@ Coverage Diff @@
## master #35 +/- ##
==========================================
- Coverage 99.24% 99.12% -0.12%
==========================================
Files 6 7 +1
Lines 264 229 -35
Branches 56 0 -56
==========================================
- Hits 262 227 -35
- Misses 0 2 +2
+ Partials 2 0 -2
Impacted Files | Coverage Δ | |
---|---|---|
src/segments/metadata.py | 0% <0%> (ø) |
|
src/segments/errors.py | 100% <100%> (ø) |
:arrow_up: |
src/segments/tokenizer.py | 100% <100%> (ø) |
:arrow_up: |
src/segments/__main__.py | 100% <100%> (ø) |
:arrow_up: |
src/segments/tree.py | 100% <100%> (+3.33%) |
:arrow_up: |
src/segments/util.py | 100% <100%> (ø) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 012c1ff...647626a. Read the comment docs.
Seems great. You forgot to mention that this also implements REPLACEMENT_MARKER
, which, while perhaps trivial, is a very positive change in my view. +1 for merging
I like how this condenses the main functionality into
If we now add a bunch of tests with cases we encounter in lexibank, there's not too much risc of running into problems with backwards incompatibility, I guess.
As far as I'm concerned, we could even make orthography profile and input/output from the LingPy tutorial part of the segments
test suite.
@bambooforest Ok with merging? From my point of view, this is backwards compatible - although I did remove some non-private methods on Tokenizer
(which I think should have been private to begin with :) ). If so, this would make 1.2, otherwise I'd also be happy to start the 2.x series with this.
Short question: is the new behavior allowing for lists as lists already implemented here, or do we go on in the way we know? If the former, could you provide a short example for this?
@LinguList the external interface has not changed - input and output is still strings; only internally data is handled as lists. I.e. .split()
and ' '.join()
are only called once and in the outermost layer of code, when receiving input and when creating output.
@LinguList The main goal of this refactoring is transparency: Passing data as strings between private methods meant splitting and joining multiple times. It also meant that (configurable) word and segment boundary markers had to be known to (and used by) each private method, which seemed error prone.
@xrotwang - I guess you have to act fast around here :-) Anyway, thanks for the update.
@xrotwang, I see, and I agree that this is quite important. I remember having some issues with this in the past.
@bambooforest sorry, for rushing this. I have a long list of stuff depending on this, which made me nervous. I truly think this is backwards compatible, though :)
closes #27 closes #32