cldf / segments

Unicode Standard tokenization routines and orthography profile segmentation
Apache License 2.0
31 stars 13 forks source link

fix problematic bug introduced by new clldutils api for csv files due to quotechar #21

Closed LinguList closed 7 years ago

LinguList commented 7 years ago

The character " is destroying the reading of orthoprofiles, since it is the quotechar as a default, so I added a keyword specifiying it should NOT assume any quotechar. This allows us to use the " as a character to be modified (just turned up when working on Uto-Aztekan)

codecov-io commented 7 years ago

Codecov Report

Merging #21 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #21   +/-   ##
=======================================
  Coverage   99.24%   99.24%           
=======================================
  Files           9        9           
  Lines         395      395           
=======================================
  Hits          392      392           
  Misses          3        3
Impacted Files Coverage Δ
segments/tokenizer.py 100% <ø> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 36df580...5c190a7. Read the comment docs.

LinguList commented 7 years ago

The alternative would be to write "" for all of those instances, but I think I'd prefer the quotechar=None solution, as it is less to explain to people...

bambooforest commented 7 years ago

I'm ok with the quotechar=None solution. Sounds good!