cldf-clts / pyclts

Apache License 2.0
10 stars 2 forks source link

V13 #26

Closed LinguList closed 3 years ago

LinguList commented 3 years ago

@xrotwang and @tresoldi, this contains some changes that I consider quite important to make sure that we do not overgeneralize (as we were doing before, see #25). This adds more code to the functions, but they are in my opinion secure. It will slidely increase unknown sounds in clts, I think, but otherwise it should not do much harm.

LinguList commented 3 years ago

BTW: this adds a new command:

$ clts map phoible/graphemes.tsv

This command maps the unmapped sounds in the file graphemes.tsv and writes the mapped version to a file graphemes.mapped.tsv. This file should be manually checked (automatically identified pre-nasalized nasals, etc. are marked by an asterisk, clusters are marked by (!) for extra attention). After annotation, graphemes.tsv can be pushed as a PR and checked by the CLTS team.

This allows us a similar mapping procedure as in concepticon, where we effectively map things manually, but use clts for a pre-processing.

So far, CLTS does everything automatically, this adds more direct control.

tresoldi commented 3 years ago

Yes, good, it is pretty much what I did this morning, in quick&dirty, to map JIPA.

But does this mean we would not use the Google Sheet after all? Once this is approved&merged, I could just go through the graphemes.tsv in Phoible's transcription data, perhaps before handling it to @cormacanderson ?

LinguList commented 3 years ago

As I want @Cormacanderson and you to work together on this, (and together with me) we use the google sheet, where I am now uploading the files. For any additional datasets, you can do the manual way, but I assume Cormac prefers to work on a spreadsheet.

cormacanderson commented 3 years ago

Hi both. I'm ready to start going through whenever. Just explain to me clearly what you would like me to do. My preference is a .csv or .tsv file, to work on it offline and then to upload it again. Would that work?

LinguList commented 3 years ago

That also works, so you can use again the google sheet: just copy one sheet there, work on it, and upload directly to the sheet if that is okay?

codecov-io commented 3 years ago

Codecov Report

Merging #26 into master will decrease coverage by 3.58%. The diff coverage is 34.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #26      +/-   ##
==========================================
- Coverage   96.68%   93.10%   -3.59%     
==========================================
  Files          30       31       +1     
  Lines        1356     1435      +79     
==========================================
+ Hits         1311     1336      +25     
- Misses         45       99      +54     
Impacted Files Coverage Δ
src/pyclts/models.py 100.00% <ø> (ø)
src/pyclts/commands/map.py 11.66% <11.66%> (ø)
src/pyclts/transcriptionsystem.py 96.70% <95.65%> (-0.23%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dfddc28...cbdb922. Read the comment docs.