Closed tlynn747 closed 3 years ago
These are also missing from cases which are definite but don't have the definite article, like "Rannóg Dóiteáin Nua-Eabhrac" in sentence 934.
Working on a (huge) patch for this. For the record, this looks like it can be 100% automated, presuming the UPOS tags and deprels are all correct. Following the definitions in the Christian Brothers' grammar (pp.25-26), here are the rules: (1) any nominal head of a word with PronType=Art, or any nominal following "cén", is definite (2) proper nouns are definite (3) any nominal head of "gach" or a word with Poss=Yes is definite (4) vocative nouns are definite (5) any nominal qualified by "a dó", "a trí", etc. (this is rare, and also slightly tricky to code, to avoid cases where the number is a different kind of nmod of the noun, like "cruinniú tar éis a dó") (6) The recursive rule... any nominal which is the head of a definite noun in the genitive case is itself definite
There's a subtlety with the recursive rule (6) since it doesn't always hold for nouns like "Gaeilge", "Béarla", or "Gaeltacht" despite these being proper nouns, and thus definite by (2):
Of course, focal/leagan/ceantair here are definite but that comes from the definite article (which would not be permitted if the definiteness propogated from the genitive noun).
Great that it can be automated. Initially I thought it was due to the buggy morphological retrieval process in v2.5, but it seems that there are tons missing in Elaine's original corpus so it became an issue of error propagation. I only noticed it during the feature comparison with the last batch of predicted/corrected trees. I see some pattern where nouns with initial mutations (in the test file at least) are more likely to be missing the Definite feature.
You can add to that list nouns following "san, faoin" etc
The san, faoin, ón, etc. examples fall under the first case PronType=Art... cén has to be treated separately b/c it doesn't have that feature and the dependency goes in the other direction (it's the head).
Also noticing that case (5) should be expanded slightly to include things like "airteagal 5", "alt (ii)", etc. without the particle "a".
Need to review nouns for this missing feature: Definite=Def
eg. 1332 (mbord)