NCIOCPL / sitewide-search-api

0 stars 4 forks source link

API error when result from ES Has multiple metadata.description values #95

Closed jfrank-nih closed 1 year ago

jfrank-nih commented 2 years ago

Issue description

Most results coming from ES to the API have only one metadata.description value, like this:

"metatag.description": "Immunotherapies are changing the landscape of cancer treatment. They work by empowering a patient's immune system to attack difficult-to-treat cancers, often leading to complete disappearance of tumors. But many patients still fail to respond to these innovative treatments, and developing immunotherapies that work for more people is a high priority."

But if an HTML page erroneously has two <meta name="description"> tags then you get an array back:

            "metatag.description": [
              [
                "Search SEER Inquiries",
                "Search SEER Inquiries"
              ]
            ]

This was fixed in #21 but has reappeared.

ESTIMATE TBD

Steps to reproduce the issue

Go to https://webapis.cancer.gov/sitewidesearch/v1/Search/cgov/en/cancer?from=10000&size=20000&site=all. You can use the from and size results to narrow down to the exact results.

What's the expected result?

Normal JSON result.

What's the actual result?

500 error

Related Tickets

blairlearn commented 1 year ago

This appears to be a change in how the data is being stored in (or presented by?) Elasticsearch.

The test data from fixing #21 shows a single array of values for metatag.description.

"metatag.description": [
  "Summer Program Application",
  "Summer Program Application for the Division of Cancer Epidemiology and Genetics (DCEG)"
],

A similar search result, from (https://edrn.nci.nih.gov/about-edrn/sites/80-creighton-university/lynch-henry/) now shows an array of values nested within another array

"metatag.description": [
    [
        "Henry Lynch is an associate member of the Early Detection Research Network.",
        "Henry Lynch is an associate member of the Early Detection Research Network."
    ]
],