Change how nested objects are indexed

max-zilla commented 1 year ago

Description

This is a proposed fix for a bug discovered in Clowder process for indexing extractor metadata into Elasticsearch. The previous code would inadvertently cast nested JSON objects as long JSON strings in some cases where arrays were being used, this PR modifies the indexer to retain the JSON structure. Features like ES type inference (double vs. string for example) is maintained.

Affected instances would need to do the following to refresh/correct the search index: POST /api/deleteindex POST /api/reindex (this does not delete the index first, must do it manually)

Review Time Estimate

[ ] Immediately
[ ] Within one week
[ ] When possible

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My change requires a change to the documentation.
[ ] I have updated the CHANGELOG.md.
[ ] I have signed the CLA
[ ] I have updated the documentation accordingly.
[ ] I have read the CONTRIBUTING document.
[ ] I have added tests to cover my changes.
[ ] All new and existing tests passed.

lmarini commented 10 months ago

To test upload extracted metadata with array of json objects and search for keys in the objects.

lmarini commented 10 months ago

Tested with


    "@context": [
        "https://clowder.ncsa.illinois.edu/contexts/metadata.jsonld",
        {
            "Predictions": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions",
            "class_name": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions.class_name",
            "class_description": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions.class_description",
            "score": "http://clowder.ncsa.illinois.edu/metadata/ncsa.tensorflow-parallel-dataset-image-classification#Predictions.score"
        }
    ],
    "agent": {
        "@type": "cat:extractor",
        "name": "ncsa.tensorflow-parallel-dataset-image-classification",
        "extractor_id": "https://clowder.ncsa.illinois.edu/clowder/extractors/ncsa.tensorflow-parallel-dataset-image-classification/2.3"
    },
    "content": {
        "Predictions": [
            {
                "class_name": "n01682714",
                "class_prediction": "American_chameleon",
                "score": 0.7607384
            },
            {
                "class_name": "n01693334",
                "class_prediction": "green_lizard",
                "score": 0.21042463
            },
            {
                "class_name": "n01687978",
                "class_prediction": "agama",
                "score": 0.016864877
            }
        ]
    }
}```

clowder-framework / clowder