cldf / segments

Unicode Standard tokenization routines and orthography profile segmentation
Apache License 2.0
33 stars 12 forks source link

Fix confusing column labels in profile __unicode__ result #39

Closed Anaphory closed 6 years ago

Anaphory commented 6 years ago

The README gave different column header than expected from the line above and below. I thought this might be a typo, but my installation said it was the implemented behaviour.

Looking at the implementation of the Profile.__unicode__ method indicated that data columns are resorted, but the header columns are not. This fixes that problem.

codecov-io commented 6 years ago

Codecov Report

Merging #39 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #39   +/-   ##
=======================================
  Coverage   99.12%   99.12%           
=======================================
  Files           7        7           
  Lines         229      229           
=======================================
  Hits          227      227           
  Misses          2        2
Impacted Files Coverage Δ
src/segments/tokenizer.py 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9c193be...936630a. Read the comment docs.

xrotwang commented 6 years ago

Hm. I don't really follow. The __unicode__ method does apply the same "grapheme goes first" logic to data rows as well: https://github.com/Anaphory/segments/blob/936630af3646c188d2c42a0884a03f367c9bf6c1/src/segments/tokenizer.py#L84-L86

xrotwang commented 6 years ago

ah, sorry. Got confused again: this time I confused original line and changes when looking at the diff ...

Yes, you are perfectly right. Will merge.