Closed MedKhem closed 7 years ago
Given the non support of nested structures in one model, I recommend the use of two different models for this case:
The first model, "form" is to structure \<form> block into \<orth>, \<pron> and \<gramGrp> blocks. The \<orth> and \<pron> are then gathered under a \<form> element and the \<gramGrp> is further segmented with the second model
The second model, "grammatical_group" has the goal to segment the \<gramGrp>. For the moment, we use just 4 labels for this model: \<pos>, \<tns>, \<gen> and \<number>.
Mind that the "lexical-entry" model could be used to segment in a first step the \<re> block and then the "form" model could be applied. I would suggest, before using "lexical-entry" model, changing the feature generation for the "lexical-entry" to add a feature for distinguishing between a lexical entry and a related entry, since there are some specific lexical differences between them.
For extracting \<gramGrp> in sense, this should be ensured by a "sense_gram" model (to get \<gramGrp> and \<sense> blocks). For segmenting \<gramGrp>, the same model "grammatical_group" used for the \<form> block could be used for \<sense> block.
It's about extracting all morphological and grammatical information of the previous level. These information could figure directly in the \<entry> under the extracted \<form> block, in \<sense> or/and the \<re> blocks.
In the case of \<entry>, the following example:
becomes
becomes
becomes