UniversalDependencies / UD_Irish-IDT

Irish data
Other
6 stars 7 forks source link

Various fixes based on QA scripts #163

Closed kscanne closed 3 months ago

kscanne commented 3 months ago

I've continued developing the suite of scripts that I started more than a year ago (https://github.com/kscanne/grammatach) that aim to correctly predict all feature values based on the tags/features of neighboring words and dependency relations. I'm using these to improve end-to-end POS tagging based on UDPipe, but they're also useful for QA for existing treebanks.

I think everything here will be non-controversial.

This patch includes fixes for some of the words with underscores discussed in issue #92... only examples like "i_mo", "i_do", "le_mo", etc. which we all agreed should lemmatize to "i", "le", etc. (following "ina", "lena", etc.). This will likely cause merge conflicts with Lauren's PR #148. I'd be happy to prepare a new one once we've settled on how to handle "caidé" and so on.

tlynn747 commented 3 months ago

Brilliant to have this cleanup going on, especially the Definiteness as it's so involved.. Thanks Kevin.

I've accepted all, but unsure about the switch from obj to nsubj in line 16042. Likely due to my misunderstanding of the sense. What's the translation of "ar gníomhartha de chuid bord sláinte iad a glacadh"?

Line 35189- why is the case not Dative if it's a PP object?

Sentence 169 (test) - why is Tom labelled as Foreign=Yes? "B'fhearr dúinn imeacht isteach a Tom."

kscanne commented 3 months ago

For the one at 16042, the constituent to focus on is just "ar gníomhartha de chuid bord sláinte iad..." (that are actions of a health board...). Then the "a glacadh..." that follows starts a relative clause describing the actions. Here "glacadh" is a past autonomous verb and not a verbal noun. So "... (actions) that were taken, in view of the Ombudsman, according to that advice".

At 35189, this pub (in Bearna!) would be "Tigh Donnelly" even in a nominative context. True that "tigh" can be a dative of "teach", but it's also common as an alternate nominative form, especially in Munster Irish.

I debated on this one. My argument is that had it been "Tomás", they'd surely have said "a Thomáis" in the vocative, and there's no lenition here. Of course there are 100's of other personal names that I haven't flagged with Foreign=Yes... this one was caught by the scripts because of the missing séimhiú.

tlynn747 commented 3 months ago

Yes - I thought glacadh was a verbal noun. Makes sense.

For Tigh, even if Tigh can be a nominative form, I'm wondering why it's not marked dative in this oblique context - where the dative is usually found i.e. following a preposition?

Re the names as Foreign. Yes probably because they're not systematically labelled as such. Which probably opens up a discussion around Foreign use of names in general. How much of it would be borrowing v loan words v real switches to English? see ch 6 in Lauren's thesis for more on that! https://doras.dcu.ie/29326/

kscanne commented 3 months ago

Regarding Tigh, we're only marking Case=Dat where there's a special spelling of the dative in modern Irish (ar leith, in Éirinn, etc.). You have Case=Nom in all other dative contexts (what used to be labeled the "Common" form). And Tigh here really is the common form. Another way of saying it — I claim this is grammatically no different from, say, the "thar fóir" in sentence 1286...and in all such cases we have Case=Nom on the noun.

And thanks for the pointer to Lauren's thesis — will check that out. I do like Foreign=Yes in this case as an "explanation" for the lack of a séimhiú... we had an example like this in our code-switching paper. I think it was "an album nua", where we argued that this was an English word because there would be a mutation if this were the borrowed word "albam". All that said, if you're happier backing this change out, I can do that until we have a chance to look at these broadly/systematically (or get Lauren's input).