digling / burmish

LingPy plugin for handling a specific dataset
GNU General Public License v2.0
1 stars 1 forks source link

Missing language Burmese in Mann1998 spreadsheet #89

Open LinguList opened 7 years ago

LinguList commented 7 years ago

I just worked on Mannn1998, as can be seen from the README and I realized that the spreadsheet does not contain all data as given in the source. As Mann also lists Burmese (as WB), this information is essential for us and should not have been deleted from the spreadsheet. The question is, how well it could be re-added without destroying too much of the corrections which have already been included in the current orthography profile. Ideally, the reconstructions should also be introduced at this stage, as they would allow us to directly test the consistency of predictions via quantitative pattern analysis.

The spradsheet I have been using so far is here:

Mann.csv.zip

It corresponds to the file Mann.csv and should replace this one, in case it is further updated. Alternatively, the missing readings could also provided in an extra-spreadsheet, in which concepts are the same as in the one above, so we can identify the partial cognates.

nh36 commented 7 years ago

how important is it that Mann's Burmese use Mann's exact orthography for Burmese?

On Thu, Feb 16, 2017 at 11:49 AM, Johann-Mattis List < notifications@github.com> wrote:

I just worked on Mannn1998, as can be seen from the README https://github.com/digling/burmish/tree/master/datasets/Mann1998 and I realized that the spreadsheet does not contain all data as given in the source. As Mann also lists Burmese (as WB), this information is essential for us and should not have been deleted from the spreadsheet. The question is, how well it could be re-added without destroying too much of the corrections which have already been included in the current orthography profile. Ideally, the reconstructions should also be introduced at this stage, as they would allow us to directly test the consistency of predictions via quantitative pattern analysis.

The spradsheet I have been using so far is here:

Mann.csv.zip https://github.com/digling/burmish/files/779951/Mann.csv.zip

It corresponds to the file Mann.csv https://github.com/digling/burmish/blob/master/datasets/Mann1998/raw/Mann.csv and should replace this one, in case it is further updated. Alternatively, the missing readings could also provided in an extra-spreadsheet, in which concepts are the same as in the one above, so we can identify the partial cognates.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/digling/burmish/issues/89, or mute the thread https://github.com/notifications/unsubscribe-auth/AIdHxQXoWiwYi9o8Ji0d2qAg9N-KLvcRks5rdDfFgaJpZM4MC6mx .

LinguList commented 7 years ago

I don't care, but it needs to be annotated in the README, under the ortho-profile section (Mann's Burmese was directly converted to our basic Burmese transcription...). In which case, the ortho-profile will need to be adjusted (I'd even prefer a separate profile for Burmese then, which can hopefully be re-used).

Please note carefully my annotations regarding the ortho-profile I created, as it may be linguistically wrong, or disputable, and I only followed pragmatic decisions which seem to be reasonable to me.

LinguList commented 7 years ago

BTW: my modification of the readme is how I imagine it: putting information in tables, using the simple table-syntax, indicating sources for languages, etc. This will make it easier to have the data in some lexibank-style, which again means it will be intra-comparable for us, as the BED currently runs as a software that draws from the lexibank code and ideas.

We won't be able to predict all things that CAN be regularized right now, so some things will need to be re-adjusted later on during the process, but this is the fate of all systematization attempts.

nh36 commented 7 years ago

Ok, I will redo the other readme files to be similar. (I will leave the bibtex information though, because, until we have an automated link to it, this information is not redundant).

On Thu, Feb 16, 2017 at 12:13 PM, Johann-Mattis List < notifications@github.com> wrote:

BTW: my modification of the readme is how I imagine it: putting information in tables, using the simple table-syntax, indicating sources for languages, etc. This will make it easier to have the data in some lexibank-style, which again means it will be intra-comparable for us, as the BED currently runs as a software that draws from the lexibank code and ideas.

We won't be able to predict all things that CAN be regularized right now, so some things will need to be re-adjusted later on during the process, but this is the fate of all systematization attempts.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/digling/burmish/issues/89#issuecomment-280315010, or mute the thread https://github.com/notifications/unsubscribe-auth/AIdHxUgg4KeNlCFe6kztUEG1fIkkm1DNks5rdD1zgaJpZM4MC6mx .

LinguList commented 7 years ago

No this is all perfectly fine. I'd say: leaving this, as long as they are systematically under a header is no problem. It is more difficult to add then to delete. I recommend the use of the headers indicated as:

## my sub title

So using the ## for second order header, ### for third-order, etc., similar to \subsection{} etc. in LaTeX.

nh36 commented 7 years ago

Mann's Burmese is a mess. I have copied it out as exactly as I can. We will need an orthography profile to make sense of it, and the best is probably to compare it directly with someone else's Burmese, e.g. Nishi. Because he takes Burmese as less related, he also plays fast and loose with phonology, so his cognate judgements must be taken with a grain of salt. In any case, the new spreadsheet is attached to this thread. Mann1998 redo.zip

nh36 commented 7 years ago

Please read this issue, and close or update as necessary.

LinguList commented 7 years ago

okay, ball goes back to @nh36: please review the orthoprofile of Mann1998, in which I added Burmese (and corrected most obvious problems, like the glottal stop and he ":" instead of the correct length mark (if this was intended). Now, what we need, is the correction of the third column for common tone in Written Burmese (I suppose lenght mark and little glottal stop mean something). So please have a look at the profile and compare this. BTW: I use "nn" in exchange for "N" for convenience in order to indicate second nucleus vowel (like an off-glide) in profiles. I'll normalize all cases later on. The part we talk about start in line 300 + x:

I added another column, called "examples", which will be useful (up to 5 examples, if less, these are the only ones) to work with a profile, as they show how the full morpheme looks in the language.

nh36 commented 7 years ago

I really can't do this without seeing our treatment of another Burmese dataset. Burmese has four tones 'level', 'high', 'creaky' and 'killed'. These terms are not easy to change into IPA (it is like middle Chinese). My preference would have been to change Mann's Burmese first into some normal Burmese transcription and then use whatever existing ortho-profile we have for that Burmese transcription to convert to IPA. A similar set of issues came up with the WBur. data from Huang that is already in the edictor.

LinguList commented 7 years ago

well, then let's do it like that: you extract the data from Line 344, create a Mann-Burmese to the rest Burmese orthoprofile, AND a near-IPA profile (just two columns). Can even be in this document, just use the "SOURCE" to fill in the normal Burmese transcription, and the CLPA for the "IPA-like". If problems with tones arise, assign as you please, using whatever symbol (HLF, etc.), but just don't forget to use the source/target annotation, so "L/" is one possibility, alternatively you write "L/¹" (the one indicates low tone in Chao-letters, etc. Creaky voice is usually assigned on the main vowel, as I have done for the other examples in Mann1998. The current CLPTS, the "cross-linguistic phonetic transcription system" is flexible enough to handle all these cases.