Closed tresoldi closed 6 years ago
Ops, there's a conflict, I'll solve it.
I should have pulled before starting to work, I see wikipedia.tsv
was changed by @LinguList on master
in preparation to the paper.
There is no rush in fixing this pull request, and it is probably better to wait for the paper to be evaluated and all, just tell what you prefer me to do. My suggestion would be to extend the wiki.tsv
file on master
with any information (sounds) from this list.
Yes, there's no rush, we have our numbers for the paper, and especially you did a great job in keeping with the app and all the discussions remotely. Looking forward to pursuing our work!
Updated the new wikipedia transcription data file. Some sounds are missing from BIPA, I've marked them in the NOTE
field.
I realize two things:
1 I should've told you that the wikipedia.tsv should be first put into the sources/ folder, from which it can be automatically linked (or manually) to the transcriptiondata folder, using the clts td
command (I use that for making the process of semi-automatic linking easier, and one can test what is already covered
2 the missing sounds call for an addition to the source code and for the adding of new features (linguo-labial, the voiced bilabial tap consonant should be added manually to BIPA, and the retroflex implosive series as well
I think we can make new issues for the missing sounds in separation, so no need to add them now to bipa. By now putting this file into sources and deleting the sounds missing in bipa, we will have an automatic list of NA sounds that are not in bipa, and which we can then systematically (or unsystematically) add in the future.
It makes sense, I assumed the NAs were coming from some private, temporary script of yours. I'll do the necessary updates to bipa and put the file in sources/
Em 3 de fev de 2018 1:43 PM, "Johann-Mattis List" notifications@github.com escreveu:
I realize two things:
1 I should've told you that the wikipedia.tsv should be first put into the sources/ folder, from which it can be automatically linked (or manually) to the transcriptiondata folder, using the clts td command (I use that for making the process of semi-automatic linking easier, and one can test what is already covered 2 the missing sounds call for an addition to the source code and for the adding of new features (linguo-labial, the voiced bilabial tap consonant should be added manually to BIPA, and the retroflex implosive series as well
I think we can make new issues for the missing sounds in separation, so no need to add them now to bipa. By now putting this file into sources and deleting the sounds missing in bipa, we will have an automatic list of NA sounds that are not in bipa, and which we can then systematically (or unsystematically) add in the future.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/pull/103#issuecomment-362823930, or mute the thread https://github.com/notifications/unsubscribe-auth/AAar9yTLaQEMLkBVY7n83mEZ4FF_g9AWks5tRH6bgaJpZM4R0UUn .
I should've added more documentation on how this is done. Something we can put on our todo-list for our official release of the package.
Took me a whole week for something that I could have done in five minutes... The current PR generates the data in transcriptiondata
from the one in sources
.
I haven't touched BIPA in terms of the missing sounds for two reasons. First, it seems more appropriate to do that in a separate PR; second, I'm still not entirely sure of what should be changed (taking a look at __main__.py
suggests that dealing with data/features.tsv
would be enough, besides calling again clts
to regenerate what is needed).
As it does not seem to be urgent, maybe we should work on documentation and some more tests first?
I'll look into this on Wednesday. I have an urgent paper deadline before...
we can always do without adding missing sounds, but the order of entities in the file is now mixed, if you look at the tsv header (or it's just too early on a Sunday morning for me), we have BIPA -> URL -> FEATURES (=same as url without the prefix, at least in the current version) -> Grapheme, but the order is mixed, and the features are not necessarily the same as the url later (although one could change all features consistently by just deleting the spaces).
haven't touched BIPA in terms of the missing sounds for two reasons. First, it seems more appropriate to do that in a separate PR; second, I'm still not entirely sure of what should be changed (taking a look at main.py suggests that dealing with data/features.tsv would be enough, besides calling again clts to regenerate what is needed).
Data/features is one part to be changed, but the algorithm requires -- if new sounds are introduced -- of course, that the respective sounds are added. If it's a diacritic, it can be handled by changing the diacritics.tsv
in bipa, but if it's more complex, it needs to be submitted to bipa/consonants.tsv, etc.
Regarding plain tests, the thing is more or less covered (@xrotwang would probably say "less"). But what you mean with documentation is what we call "contributing" in concepticon: how can you propose changes? It is probably easiest to run through this if I find time to add a new sound and describe what needs to be done before a new PR. But unfortunately, I am quite busy until mid of next week, as I have an urgend paper deadline on unrelated topics...
First of all, no rush to answer, I imagine that besides your paper you probably also have the April workshop to organize.
This is a bit embarassing, I can't get it right. On the matter:
lexibank
), I see I was ambiguous.I should be able to make it right now. I will modify the sources/wiki
on master by only adding the missing sounds (i.e., no completely new information), run clts
to check for the missing sounds, and add them to IPA. Sorry for so much trouble for such a little change, my mind is also occupied by a lot of other things. ;)
No problem. We are under no pressure with this. Aiming for around April (or even beginning of May) to publish online is extremely realistic, even if my plan in beatifying the clld css and adding some fancy colors is feasible ;) Even in the current version, the app could go online, but your input is extremely valuable to fill some gaps. Let's all take our time and reserve our stamina for April, when we can really start.
closing this now, please reopoen in due time if the data is ready for this
As discussed at https://github.com/cldf/clts/issues/102
This extends the Wikipedia data with ~150 sounds and orders them.
It is important to note that some sounds seem to be missing from BIPA. I've manually compiled their names using similar sound names as reference, but this should be checked. The sounds are (with the grapheme used on Wikipedia and my manually composed name):