Open goodmami opened 6 years ago
_koto_n_nom
can perhaps just be dropped, or be added to the auto-include set for my extractor (gets included in an extracted transfer rule even if it didn't exist in the predicate alignment, as long as it is incorporated into the rest of the MRS fragment).
unspec_adj
and degree
have the same count because they always co-occur. There should be a general rule or two written for these. Maybe:
;;; e.g. 2 キロ の 水 -- 2 kilograms of water
degree+unspec_adj--noun+of_p := monotonic_mtr &
[ INPUT.RELS < #m [ LBL #h1, ARG0 #x2 ],
[ PRED "ja:degree", LBL #h3, ARG1 #e4, ARG2 #x2 ],
[ PRED "ja:unspec_adj", LBL #h3, ARG0 #e4, ARG1 #x3 ],
[ LBL #h3, ARG0 #x3 ] >,
OUTPUT.RELS < #m [ LBL #h1, ARG0 #x2 ],
[ PRED "_of_p_rel", LBL #h1, ARG1 #x2, ARG2 #x3 ],
[ ARG0 #x3 ] > ].
and a different rule for the generic_entity case... although the rule above might be broken...
_you_n
would be tough to write rules for... but maybe it can just be dropped.
From the ACE generation (ERG) log files in a translation pipeline:
This is a partial list. On the left are the occurrence counts. It's not surprising that Jacy predicates are not covered by the ERG, but when they are very frequent it means that JaEn should perhaps have a hand-built rule to catch the cases when the automatically extracted rules fail to transfer something. In some cases, there is such a rule, but it has become outdated. For instance,
neg_x
is not covered because JaEn's rule still targetsneg_v
. Similarly, JaEn targetscoord
instead ofcoord_c
.And here's some of those that aren't covered on the ERG side:
There some other reasons for these, but generally it's also because the hand-built JaEn rules are out of date. The
def_q
andimplicit_q
ones are because the modified SEM-I for the ERG missed.