UniversalDependencies / UD_Irish-IDT

Irish data
Other
6 stars 7 forks source link

Incorrect POS tags for "ach" #149

Open kscanne opened 2 years ago

kscanne commented 2 years ago

In the current version of the treebank, "ach" is always tagged as a conjunction (SCONJ in most cases and a few CCONJ). In many cases it should be ADP... see for example sentences 911, 948, 964, 1350, etc. in the training file. Foclóir Uí Dhónaill agrees... listed as "conj. & prep.": https://www.teanglann.ie/en/fgb/ach

I count 446 examples that will need review.

laurenCassidy commented 2 years ago

I agree. Further, which POS-tag and dependency relation do you think would be best for 'ach' when it used with a negative verb to mean 'only'?

e.g. 'Níl ann ach ceann amháin'

Possibly adverb or particle?

kscanne commented 2 years ago

I read "ach ceann amháin" as a PP, so "ach" would be ADP here. Same with examples like "Ní ithim ach sceallóga" or (sentence 948) "...níor mhair an cumann beag carthanach seo ach seal".

tlynn747 commented 2 years ago

The SCONJ/ CCONJ pos-tagging comes from Elaine's tags in the original gold-standard POS-tagged part of the NCI. I don't mind if they're changed as long as it's backed up linguistically.

However, re the dependency relation, I don't see why particle cannot continue to be used... it's been like that since the IDT and never caused confusion in annotation or parsing. I chose "particle" based on theoretical linguistic research at the time. See excerpt of thesis:

Screenshot 2022-04-26 at 11 20 20

laurenCassidy commented 2 years ago

It does seem that 'ach' is dependent on the verb for the 'only' constructions. e.g. ní ithim ach sceallóga *ithim ach sceallóga I wouldn't say that 'ach sceallóga' is a prepositional phrase but rather the 'ach' is modifying VP

kscanne commented 2 years ago

Sorry, I was being dense re: PP. Here's another example that illustrates this I think: "ní ithim dinnéar ach ar an tolg".

In any case, the annotations in the current version of the treebank don't always match the exemplars Teresa posted above; see sentence 948 in training ("...níor mhair an cumann beag carthanach seo ach seal"), sentence 1755 ("...ní raibh ach méadú 1% ar an líon cásanna a críochnaíodh"), etc. where "ach" is dependent on the following noun.

michealjohnny commented 2 years ago

I believe the SCONJ or CCONJ tags are correct in a number of cases, a good few of the usage examples in FGB are one or other of these imo. (https://www.teanglann.ie/ga/fgb/ach) e.g. From FGB: Beir air [ach] ná bris é, catch it but don’t break it. -> CCONJ

======== @kscanne has hit the nail on the head, there are additional usages for this token beyond the exemplars. I agree with @laurenCassidy that "ach" is dependent on the verb in the examples given above, typically where "ach" translates as either "only" or "but" or "except" [see sense#2 of "ach" in FGB for examples of except: https://www.teanglann.ie/ga/fgb/ach].

Example 1. Ní ithim ach scealóga - I only eat chips. Example 2. Ní raibh ach méadú 1% ar an líon cásanna a críochnaíodh - There was but a 1% increase in the number of cases completed. Further: Example 3. (Taken from FGB): "Nach bhfuil leat [ach] an leabhar seo? Have you brought nothing but this book?" (Another translator might use "except" instead of "but" in this translation)

If I understand the guidelines correctly then I would support a change to ADP, as ye have recommended, for usages like examples 1, 2, 3.

======== Let me know if you decide to go ahead with a review of these and I'll lend a hand with the manual checking, as it appears as though this cannot be automated.

tlynn747 commented 2 years ago

My illustrations above were to demonstrate that the "ach" should be attached to the verb. If there were mistakes made by the annotators in the attachments, then please go ahead and fix them.

As for POS-tags, I'm not attached to SCONJ or CCONJ on any of those and accept that errors crept through from the original corpus. But I'm not 100% convinced of all of these examples being ADP.

Example 1: Is your suggestion that it's just an ADP attached to the verb as a standalone PP attachment? Or is joined with scealóga as a complex PP?

I'd argue for Example 1 as being ADV... modifying the verb, and that scealóga is simply a NOUN as obj.

michealjohnny commented 2 years ago

"I'd argue for Example 1 as being ADV... modifying the verb, and that scealóga is simply a NOUN as obj." Agreed, thanks @tlynn747. I had read the ADP docs but hadn't factored everything in when posting above, of course ADV is what we're after for the verb modifier.