cldf-clts / clts

Cross-Linguistic Transcription Systems
https://clts.clld.org
13 stars 3 forks source link

Update sources #77

Closed tresoldi closed 3 years ago

tresoldi commented 3 years ago

This PR reviews the mapping for all sources (with the exception of allenbai, beidasinitic, and bdpa, which I did not touch), also adding a handful of new sounds to BIPA, such as "voiceless retroflex approximant", and regenerating all packages and the app.

id valid total percent
allenbai 114 115 0.99
apics 177 177 1.00
bdpa 1329 1466 0.91
bdproto 745 794 0.94
beidasinitic 145 145 1.00
chomsky 45 45 1.00
diachronica 561 652 0.86
eurasian 1478 1562 0.95
jipa 937 957 0.98
lapsyd 793 795 1.00
multimedia 137 138 0.99
nidaba 1927 1936 1.00
panphon 6272 6334 0.99
pbase 841 1068 0.79
phoible 3094 3182 0.97
powoco 371 378 0.98
ruhlen 533 701 0.76
sala 106 128 0.83
saphon 345 357 0.97
segbo 215 219 0.98
wiki 168 184 0.91
20 0.94
cormacanderson commented 3 years ago

What is it that we are missing from LAPSyD? I thought everything was mapped there now.

LinguList commented 3 years ago

@tresoldi, I cannot see this commit due to the file number. But tell me please quickly: did you add the new column which we call Symbols" to the tsv files in this run? If not, I suggest strongly to do this now, as it will otherwise again be a large PR. We later also modify pyclts to add the symbols as well in the pkg/-files.

LinguList commented 3 years ago

@tresoldi, can you please answer my question on the Symbols? I'd like to know if this still needs to be advaanced or not? I mean, the CLTS commandd is already there, and routinely adds them. Just a quick reply please.

tresoldi commented 3 years ago

@LinguList I kept looking here and can't understand which "Symbols column" you mean. I followed your example as much as I could, and it was the same workflow for the mappings we already merged (i.e., phoible, lapsyd, jipa).

LinguList commented 3 years ago

@tresoldi, I wrote an email on this topic, and @cormacanderson and you both agreed that the feature was useful. Please pull the most recent pyclts code, and run the mapping one time, to see what I actually mean.

tresoldi commented 3 years ago

I know, it is the email from 22 Nov 2020 20:10 and I agree it is a good idea. What I meant is that I cannot understand what I should have changed in the workflow.

For all sources, I ran clts map first, manually proceeded in correcting/improving the mappings, generated each package with clts make_dataset, and finally regenerated the app with clts make_app.

I did this with a pyclts from your map-fix branch, up to commit https://github.com/cldf-clts/pyclts/commit/5d58dccb484f440f9438bc68469b0b6ca08cf1f9 which is the most recent I can see there.

I can revert the changes due to make_app, and even due to all all make_dataset, if you prefer to get a simpler diff, after addressing the points raised by @cormacanderson first.

LinguList commented 3 years ago

Tiago, I cannot view the PR since it is mega large. Don't you see that? I ask you simply: do your files in our source folder HAVE the Systems column, or do they NOT have it. Can you please just answer on that question?

tresoldi commented 3 years ago

No, they don't have that column: https://github.com/cldf-clts/clts/blob/sources/sources/chomsky/graphemes.tsv

I ran clts map again from my setup and it is not adding it, unfortunately.

I will redo the PR only with the various graphemes.tsv, reverting all other files to the old versions.

cormacanderson commented 3 years ago

I agree that the Symbols is a good idea.

However, if you just change the graphemes.tsv files in this PR, without making the necessary changes to pkg/transcriptionsystems/bipa/consonants.tsv will some of the graphemes.tsv mappings not fail?

Does it make sense to first make sure that all of the mappings are fine and then add in the Symbols in a new PR straight afterwards?

LinguList commented 3 years ago

Yep, so my proposal is, having clarified this: you two, @cormacanderson and @tresoldi, figure out how to make sure all is correct, @tresoldi merges this PR, and then we fix with the symbols, but this time: step-by-step, not in the form of monster pull-requests, where too many files are modified.

LinguList commented 3 years ago

So to finish this, I'd suggest to quickly address teh points made by @cormacanderson and then merge. We can later step by step do the conversion with the symbols for better visibility.

tresoldi commented 3 years ago

Ok, I will address the points by @cormacanderson , use the code that was merged into pyclts, regenerate, and push.

tresoldi commented 3 years ago

Ok, I have updated the sources, addressed the issues, regenerated, and will now merge. Note that I did not touch allenbai, beidasinitic, and bdpa.