distantreading / WG1

Discussion documents and working papers from WG1
8 stars 9 forks source link

TitleStudy: splitting up words (new formations) #42

Closed jberenike closed 4 years ago

jberenike commented 4 years ago

How do we deal with splitting up words? New formations like "Sonnenwirt" in German (with its productive morphology, in this case composition) would be presented in English as two words! sonnen = otherEntity; wirt = person, status see issue #36

galleron commented 4 years ago

As I said before, I think you can put these two informations in the two columns you indicate.

jberenike commented 4 years ago

... this is issue is related to the one on genre indicators. I have not split up these cases, adopting the position that we are dealing with a noun which in the discourse has the (predominant) function to indicate an entity's existence and status. Also the type of new formation is highly conventional (productive morphology), so no special attention is drawn to its "novelty."

jberenike commented 4 years ago

... this is issue is related to the one on genre indicators. I have not split up these cases, adopting the position that we are dealing with a noun which in the discourse has the (predominant) function to indicate an entity's existence and status. Also the type of new formation is highly conventional (productive morphology), so no special attention is drawn to its "novelty."

.... discussing with myself here! In the case of "amazonenschlacht" (deu059) (battle of the amazons) it seems more appropriate to code two units. Argument: the semantics of "schlacht" as head of the compound asks for agency (whose battle?) (by contrast, the Sonnenwirt appears further removed from the physical location of the inn; in fact by (far) connotation/semantic prosody, the attributes of "sun" may be assigned to him.)

galleron commented 4 years ago

I think the discussion with yourself explained clearer what I meant some messages ago ;-)! I think we need to be semantically consistent, even if this means annotating in two different ways a similar grammatical construction.

CarolinOdebrecht commented 4 years ago

Maybe, this is too late: But I would annotate Sonnenwirt as a Person. It is a reference to a person and we do not care about the morphological complexity.

jberenike commented 4 years ago

Maybe, this is too late: But I would annotate Sonnenwirt as a Person. It is a reference to a person and we do not care about the morphological complexity.

Dont think it's too late. Working on the final annotations.

I share @CarolinOdebrecht 's intuition about Sonnenwirt.

How about "Amazonenschlacht"? Bündnergeschichte? I am wrestling with a clear annotation rule here.
That rule could include: "dont split up words and just assign one referent per word" (see below for my solution, for now!)

So Sonnenwirt == Person Amazonenschlacht == otherEntity (!not amazonen == person; schlacht == otherEntity) Bündnergeschichte == GenreIndicator (!not bünder == place; geschichte == genreindicator)

In prior annotation studies (of metaphor) we used as external resource for determining lexical units a (corpus-based) dictionary. If a word was represented by a dictionary lemma, we would not split it up. If it wasn't, we would split it up, on the basis of: "this is a novel formation, and the two composite lexemes are still semantically independent." (Now, to complicate things, in German we do have an active morphology, and the dictionary obviously cannot cover all conventionally formed lexemes.)

Anyways, for the three examples, I checked and neither Duden nor DWDS feature lemmata for either of the three (the DWDS says: no contemporary entry). However, the DWDS does come up with a substantial no. of corpus hits for "Sonnenwirt" and "Amazonenschlacht" https://www.dwds.de/?q=Amazonenschlacht (not für "Bündnergeschichte"). For Sonnenwirt, this confirms my intuiton: it's just a person, it's unlikely it would be interpreted as having a "place reference" in the discourse.

( Heinrich Heine, 1826: "Der Sonnenwirth lächelte gar schlau und mochte wohl wissen, daß der Carzer von den Studenten in Göttingen Hotel de Brühbach genannt wird.")

Important: Amazonenschlacht --> the meaning of the word, whether lexicalized or not, does have a person referent 'female warrior'. ( checked corpus hits https://www.dwds.de/r?corpus=dtaxl;q=Amazonenschlacht )

So, my question is: should I (a) split up the lexemes: amazonen (person) + schlacht (otherentity) (b) leave intact the lexemes, but assign entities: amazonenschlacht (person) AND amazonenschlacht (otherEntity) (c) leave intact the lexemes and decide what is more prevalent in terms of meaning: amazonenschlacht --> EITHER person OR otherEntity

my solution for now: "multiple annotation," deciding by the question: what is the conventional referent of the word (around 1900)?. So if this meaning features multiple referents, these are annotated (but the word is not split up). Important: this is different from looking at morphology!

--> I go by strategy (b) --- leaving intact the lexeme, but assigning multiple referents ( for "amazonenschlacht" and "bündnergeschichte", but not for "sonnenwirt").

galleron commented 4 years ago

I agree with strategy b), because I think it is what will provide us with the maximum of information about "what's in a title". This is also the reason why I've finally thought that we should annotate a place in formations such as "Parisian habits", or "Turkish wars", even if semantically the decision is debatable. My understanding is that we are looking at what a title suggests to a potential reader ("this will be about a person, or about a place, or about something that happened in a place, etc.), and multiple annotations seem an appropriate way to capture the wealth of suggestions a title makes. I guess one of the things we'll study is to what extent certain collections are more informative, title wise, while others are more suggestive.