Update sources - Githubissues

tresoldi commented 3 years ago

This PR reviews the mapping for all sources (with the exception of allenbai, beidasinitic, and bdpa, which I did not touch), also adding a handful of new sounds to BIPA, such as "voiceless retroflex approximant", and regenerating all packages and the app.

id	valid	total	percent
allenbai	114	115	0.99
apics	177	177	1.00
bdpa	1329	1466	0.91
bdproto	745	794	0.94
beidasinitic	145	145	1.00
chomsky	45	45	1.00
diachronica	561	652	0.86
eurasian	1478	1562	0.95
jipa	937	957	0.98
lapsyd	793	795	1.00
multimedia	137	138	0.99
nidaba	1927	1936	1.00
panphon	6272	6334	0.99
pbase	841	1068	0.79
phoible	3094	3182	0.97
powoco	371	378	0.98
ruhlen	533	701	0.76
sala	106	128	0.83
saphon	345	357	0.97
segbo	215	219	0.98
wiki	168	184	0.91
20			0.94

cormacanderson commented 3 years ago

What is it that we are missing from LAPSyD? I thought everything was mapped there now.

LinguList commented 3 years ago

@tresoldi, I cannot see this commit due to the file number. But tell me please quickly: did you add the new column which we call Symbols" to the tsv files in this run? If not, I suggest strongly to do this now, as it will otherwise again be a large PR. We later also modify pyclts to add the symbols as well in the pkg/-files.

LinguList commented 3 years ago

@tresoldi, can you please answer my question on the Symbols? I'd like to know if this still needs to be advaanced or not? I mean, the CLTS commandd is already there, and routinely adds them. Just a quick reply please.

tresoldi commented 3 years ago

@LinguList I kept looking here and can't understand which "Symbols column" you mean. I followed your example as much as I could, and it was the same workflow for the mappings we already merged (i.e., phoible, lapsyd, jipa).

LinguList commented 3 years ago

@tresoldi, I wrote an email on this topic, and @cormacanderson and you both agreed that the feature was useful. Please pull the most recent pyclts code, and run the mapping one time, to see what I actually mean.

tresoldi commented 3 years ago

I know, it is the email from 22 Nov 2020 20:10 and I agree it is a good idea. What I meant is that I cannot understand what I should have changed in the workflow.

For all sources, I ran clts map first, manually proceeded in correcting/improving the mappings, generated each package with clts make_dataset, and finally regenerated the app with clts make_app.

I did this with a pyclts from your map-fix branch, up to commit https://github.com/cldf-clts/pyclts/commit/5d58dccb484f440f9438bc68469b0b6ca08cf1f9 which is the most recent I can see there.

I can revert the changes due to make_app, and even due to all all make_dataset, if you prefer to get a simpler diff, after addressing the points raised by @cormacanderson first.

LinguList commented 3 years ago

Tiago, I cannot view the PR since it is mega large. Don't you see that? I ask you simply: do your files in our source folder HAVE the Systems column, or do they NOT have it. Can you please just answer on that question?

tresoldi commented 3 years ago

No, they don't have that column: https://github.com/cldf-clts/clts/blob/sources/sources/chomsky/graphemes.tsv

I ran clts map again from my setup and it is not adding it, unfortunately.

I will redo the PR only with the various graphemes.tsv, reverting all other files to the old versions.

cormacanderson commented 3 years ago

I agree that the Symbols is a good idea.

However, if you just change the graphemes.tsv files in this PR, without making the necessary changes to pkg/transcriptionsystems/bipa/consonants.tsv will some of the graphemes.tsv mappings not fail?

Does it make sense to first make sure that all of the mappings are fine and then add in the Symbols in a new PR straight afterwards?

LinguList commented 3 years ago

Yep, so my proposal is, having clarified this: you two, @cormacanderson and @tresoldi, figure out how to make sure all is correct, @tresoldi merges this PR, and then we fix with the symbols, but this time: step-by-step, not in the form of monster pull-requests, where too many files are modified.

LinguList commented 3 years ago

So to finish this, I'd suggest to quickly address teh points made by @cormacanderson and then merge. We can later step by step do the conversion with the symbols for better visibility.

tresoldi commented 3 years ago

Ok, I will address the points by @cormacanderson , use the code that was merged into pyclts, regenerate, and push.

tresoldi commented 3 years ago

Ok, I have updated the sources, addressed the issues, regenerated, and will now merge. Note that I did not touch allenbai, beidasinitic, and bdpa.

cldf-clts / clts

Update sources #77