Open davidzbiral opened 6 months ago
@davidzbiral
This is not a precisely defined rule. What would be the order in case TEXT2
is selected in the example below?
<a>text1 <c> TE<d>XT<b>2 </a> text3 </b> text4 text5 text 6</c>text 7</d>
I think we need to be more specific. What about sorting by entity class + alphabet? I think it might be easier to search for a specific entity in such a case.
@adammertel We are operating on a word (word token) basis, so the example is somewhat artificial (even if allowed by the application) - but that's just a small note, and there are languages where we will need character level.
I will reformulate completely, my request was not ideal: those which start in the selected span, sort by order of appearance, i.e. a, c, d, b. I.e. let's make it more simple (perhaps it is how you do it? Sorry, no time to open app now and inquire): always sort by order of appearance in the full-text. Both those which start outside of the span and within the span. I.e. first you will see the whole-full-text, then subT, then subsubT, then S, then e.g. Location within that statement, etc. I.e. let's follow the order of appearance of the start tag.
This means that you should have a process of whether a new anchor on the same span that already has one should be put inside or outside, if you get what I mean. E.g. if "Lombardia" already is enclosed with anchors of L Lombardy, and then I select it again, whether the new anchors should go inside, or outside. I think that inside. Definitely not crossing anchors (which would be - completely unnecessarily - xml-invalid).
@adammertel So do whatever appropriate with this issue, but I appended a new topic, the one of enclosing same span into anchors.
Oh and Adam, I think we should prevent from creating differences by whitespace alone. I.e. it should not probably be possible to do
Should I create a new issue which will describe these two things, and you will then choose what needs to be done with the original issue?
Sort the entity labels spanning the active text from inside outwards, i.e. from the closest to the farthest. I.e. e.g. from a Location to the Statement it is in to immediate subT to parent subT to grandparent subT etc.