HumanSignal / label-studio-converter

Tools for converting Label Studio annotations into common dataset formats
https://labelstud.io/
262 stars 130 forks source link

feat: LEAP-608: Support Repeater in JSON-MIN exports #268

Closed jombooth closed 9 months ago

jombooth commented 9 months ago

Repeater annotations previously didn't work with the JSON-MIN export because the annotations have from_names like my_tag_0, my_tag_1 due to the use of index variables in from names in repeater configs, eg:

<View>
  <Repeater on="$images" indexFlag="{{idx}}" mode="pagination">
    <Image name="page_{{idx}}" value="$images[{{idx}}].url"/>
    <Header value="Utterance Review"/>
    <RectangleLabels name="labels_{{idx}}" toName="page_{{idx}}">
      <Label value="Document Title" />
      <Label value="Document Date" />
    </RectangleLabels>
    <Taxonomy 
      name="categories_{{idx}}"
      toName="page_{{idx}}"
      perRegion="true"
      visibleWhen="region-selected"
    >
      <Choice value="Archaea"/>
      <Choice value="Bacteria"/>
      <Choice value="Eukarya">
        <Choice value="Human"/>
        <Choice value="Oppossum"/>
        <Choice value="Extraterrestrial"/>
      </Choice>
  </Taxonomy>
  </Repeater>
</View>

And before this PR, if annotation results' from names didn't exactly match the output tags from the labeling config (which would in this case be labels_{{idx}} and categories_{{idx}}), we'd skip those results. The labeling configs LSC receives from the Label Studio frontend contain information about variables like {{idx}} though, in their 'regex' key (see label_config_repeater.json); this PR enables LSC to recognize from_names in annotation results that match these regexes.

All LSC tests have been restored+refactored to run with pytest and test coverage has been added for the JSON_MIN export.

$ pytest tests/
================================== test session starts ==================================
platform linux -- Python 3.8.18, pytest-7.2.2, pluggy-1.3.0
rootdir: /home/jo/Repos/label-studio-converter
plugins: mock-1.10.3, django-4.1.0, xdist-2.5.0, forked-1.6.0, env-0.6.2, anyio-4.2.0, cov-2.12.1, tavern-2.3.0, requests-mock-1.5.2
collected 19 items                                                                      

tests/test_brush.py ..                                                            [ 10%]
tests/test_export_conll.py .........                                              [ 57%]
tests/test_export_csv.py ..                                                       [ 68%]
tests/test_export_json_min.py ..                                                  [ 78%]
tests/test_export_yolo.py ..                                                      [ 89%]
tests/test_import_yolo.py ..                                                      [100%]

================================== 19 passed in 0.43s ===================================
codecov-commenter commented 9 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

:exclamation: No coverage uploaded for pull request base (master@8a93eb6). Click here to learn what that means.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #268 +/- ## ========================================= Coverage ? 48.21% ========================================= Files ? 22 Lines ? 1823 Branches ? 0 ========================================= Hits ? 879 Misses ? 944 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jombooth commented 9 months ago

The black lint job fails for files that weren't changed in this PR. To avoid a mess of changes, I'm not planning to run black on the LSC code; we can do that in a followup if desired.