digitallinguistics / data-format

The Data Format for Digital Linguistics (DaFoDiL)
https://format.digitallinguistics.io
MIT License
21 stars 0 forks source link

add Lexeme.UR #340

Closed dwhieb closed 1 year ago

dwhieb commented 5 years ago

Is this change related to a problem? Please describe.

Many linguists include an underlying representation (UR) of forms in their documentary and descriptive work. This form is often an abstract version of the lexeme, which sometimes never even appears in actual texts.

Describe the solution you'd like

Add a Lexeme.UR field whose value is a Transcription object.

Describe alternatives you've considered

We might not want to include the UR, because this functionality might already be covered by the forms field. The UR form would be the first form listed.

Additional context

The notion of a UR is closely tied to generativist theories of linguistics, and tends to be avoided by functionalists. Adding this field feels somewhat like adding a field to satisfy a particular theoretical outlook. It would be ideal if the notion of a UR could be captured using the already-existing schemas, with some adjustments in the descriptions of how the preexisting fields are meant to be used.

dwhieb commented 5 years ago

It would be good to talk to @monicamacaulay, @HunterLockwood, and others about this as well (maybe Brendon Yoder, Phillip Rogers).

monicamacaulay commented 5 years ago

Hmm. I see your point, but it's only generative in the sense that Structuralists were generativists! This arises for us in cases where there's some kind of morphophonemic stuff going on - e.g. a vowel changes its quality in a certain context. Bloomfield also uses old-fashioned morphophonemes, which basically capture historical wrinkles in morphophonemic behavior. An example is his /N/ - there was a PA (Proto-Algonquian) n, which remained /n/ in Menominee, but PA theta (I can't get the symbol to insert) and l fell together with n. However, the reflexes don't all show the same behavior. He uses /N/ to indicate the one that came from theta or l, and /n/ for regular *n. The reason is that /N/ --> s / __ e, e:, and y (but regular /n/ does not - it stays [n]). So that's why we need the UR field. Even if a given linguist didn't believe in underlying forms, here we're representing Bloomfield's analysis, not our own, and we want to maintain that so that it's always recoverable. Does that make sense? (How would a functionalist handle morphophonemes? I mean, it seems like any theory would have to have some sort of diacritic device like B's /N/.)

dwhieb commented 4 years ago

@monicamacaulay This is really helpful. Thanks for taking the time to write up this example!

Are there any instances where the underlying form of a word is different from the headword (i.e. the lemma)? (Note that I'm not asking whether the underlying form of a word is ever different from the citation form of the word.)

For reference, here's how I'm defining lemma / headword in the DLx format:

A lemma is the form of a lexeme conventionally used to represent that lexeme. It may differ drastically from the citation form. For example, the form be is typically used as the lemma form of the English verb forms am, are, is, etc.

In FLEx, the lemma field is intended to be the underlying form. I'm thinking maybe that's how the DLx format should work too. (If so, I'd make it clear that that's how the lemma field is intended to be used.) What do you think?

dwhieb commented 4 years ago

After some discussion, it seems like the underlying representation is essentially the lemma. Closing this issue for now, until a specific case arises which might prompt us to rethink separating out the two properties.