clowder-framework / clowder

A data management system that allows users to share, annotate, organize and analyze large collections of datasets. It provides support for extensible metadata annotation using JSON-LD and a distribute analytics event bus for automatic curation of uploaded data.
https://clowderframework.org/
University of Illinois/NCSA Open Source License
33 stars 17 forks source link

Change how nested objects are indexed #404

Closed max-zilla closed 10 months ago

max-zilla commented 1 year ago

Description

This is a proposed fix for a bug discovered in Clowder process for indexing extractor metadata into Elasticsearch. The previous code would inadvertently cast nested JSON objects as long JSON strings in some cases where arrays were being used, this PR modifies the indexer to retain the JSON structure. Features like ES type inference (double vs. string for example) is maintained.

Affected instances would need to do the following to refresh/correct the search index: POST /api/deleteindex POST /api/reindex (this does not delete the index first, must do it manually)

Review Time Estimate

Types of changes

Checklist:

lmarini commented 10 months ago

To test upload extracted metadata with array of json objects and search for keys in the objects.

lmarini commented 10 months ago

Tested with


    "@context": [
        "https://clowder.ncsa.illinois.edu/contexts/metadata.jsonld",
        {
            "Predictions": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions",
            "class_name": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions.class_name",
            "class_description": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions.class_description",
            "score": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions.score"
        }
    ],
    "agent": {
        "@type": "cat:extractor",
        "name": "ncsa.tensorflow-parallel-dataset-image-classification",
        "extractor_id": "https://clowder.ncsa.illinois.edu/clowder/extractors/ncsa.tensorflow-parallel-dataset-image-classification/2.3"
    },
    "content": {
        "Predictions": [
            {
                "class_name": "n01682714",
                "class_prediction": "American_chameleon",
                "score": 0.7607384
            },
            {
                "class_name": "n01693334",
                "class_prediction": "green_lizard",
                "score": 0.21042463
            },
            {
                "class_name": "n01687978",
                "class_prediction": "agama",
                "score": 0.016864877
            }
        ]
    }
}```