dkpro / dkpro-cassis

UIMA CAS processing library written in Python
https://pypi.org/project/dkpro-cassis/
Apache License 2.0
84 stars 22 forks source link

Sort order for selects and indexes #247

Open reckart opened 2 years ago

reckart commented 2 years ago

Describe the bug The normal sort order for selects and annotation indexes in UIMA is begin (asc) end (desc). However, cassis sorts begin (asc) end (asc) and when a select returns multiple types, one type block comes after the other instead of the results being sorted by offsets.

To Reproduce

    xmi = """<?xml version="1.0" encoding="UTF-8"?>
        <xmi:XMI xmlns:xmi="http://www.omg.org/XMI" xmlns:tcas="http:///uima/tcas.ecore" 
            xmlns:cas="http:///uima/cas.ecore" xmi:version="2.0"> 
            <cas:NULL xmi:id="0" /> 
            <tcas:DocumentAnnotation xmi:id="2" sofa="1" begin="0" end="4" language="en" /> 
            <tcas:Annotation xmi:id="3" sofa="1" begin="100" end="200" /> 
            <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="Test" /> 
            <cas:View sofa="1" members="2 3" />
        </xmi:XMI>"""

    cas = load_cas_from_xmi(xmi)
    for a in cas.select(TYPE_NAME_ANNOTATION))
        print(a)   

This shows Annotation before DocumentAnnotation

Expected behavior select results should be sorted by begin (asc) end (desc) - at least for annotations.

Please complete the following information: