CredentialEngine / CredentialRegistry

Repository for development of the Credential Registry
Apache License 2.0
12 stars 10 forks source link

Slow query response times when using ceterms:ownedBy #697

Open siuc-nate opened 5 months ago

siuc-nate commented 5 months ago

We are seeing unusually slow response times (15+ seconds) with the following query:

{
  "search:termGroup": {
    "search:value": [
      {
        "ceterms:name": "software",
        "ceterms:description": "software",
        "ceterms:ownedBy": {
          "ceterms:name": "software"
        },
        "search:operator": "search:orTerms"
      },
      {
        "ceterms:availableOnlineAt": "search:anyValue",
        "ceterms:availableAt": {
          "ceterms:address": {
            "ceterms:addressRegion": [
              {
                "search:value": "jersey",
                "search:matchType": "search:exactMatch"
              }
            ]
          }
        },
        "search:operator": "search:orTerms"
      }
    ],
    "search:operator": "search:andTerms"
  }
}

Removing the "ownedBy" part makes it much faster. Is there something about crawling that particular connection that would cause these to be slow?

jeannekitchens commented 5 months ago

@edgarf @excelsior this is a query being used by one of our state partners. It's a big issue to keep running into blockers that delay our partners.

excelsior commented 5 months ago

@jeannekitchens @siuc-nate I'm still working on this. The couple of solutions I've tried already negatively affected some other queries, so I couldn't deploy those. Hope to resolve it tomorrow.

excelsior commented 5 months ago

@siuc-nate I haven't fully figured out what might cause those issues. On the surface, the sheer number of the branches is those subqueries may be the problem.

It's may not be related, but this particular case the last conditions in each subquery bring no results, i.e. both

"ceterms:ownedBy": {
  "ceterms:name": "software"
}

and

"ceterms:availableAt": {
  "ceterms:address": {
    "ceterms:addressRegion": [
      {
        "search:value": "jersey",
        "search:matchType": "search:exactMatch"
      }
    ]
  }
}

don't matter at all. Removing either of them makes the query much faster; removing both even more so.

Could the partner use the query instead for the time being:

{
  "search:termGroup": {
    "search:value": [
      {
        "ceterms:name": "software",
        "ceterms:description": "software",
        "search:operator": "search:orTerms"
      },
      {
        "ceterms:availableOnlineAt": "search:anyValue"
      }
    ],
    "search:operator": "search:andTerms"
  }
}

The results will be the same.

siuc-nate commented 5 months ago

@excelsior On a somewhat related note, this query is taking ~4.5 seconds, which still seems somewhat slow:

{
  "@type": {
    "search:value": "ceterms:Credential",
    "search:matchType": "search:subClassOf"
  },
 "search:termGroup": {
    "search:value": [
      {
        "ceterms:name": "nursing",
        "ceterms:description": "nursing",
        "ceterms:ownedBy": {
                  "ceterms:name": "nursing"
              },
        "search:operator": "search:orTerms"
      },
      {
        "ceterms:credentialStatusType": {
          "ceterms:targetNode": "credentialStat:Active"
        },
        "search:recordPublishedBy": "ce-cc992a07-6e17-42e5-8ed1-5b016e743e9d"
      }
    ],
    "search:operator": "search:andTerms"
  }
}
siuc-nate commented 3 months ago

@excelsior this query is taking ~26 seconds:

{
    "search:termGroup": {
        "search:operator": "search:orTerms",
        "ceterms:ownedBy": {
            "ceterms:address": { 
                "ceterms:addressRegion": [ "KS", "Kansas" ]
            }
        },
        "ceterms:offeredBy": {
            "ceterms:address": { 
                "ceterms:addressRegion": [ "KS", "Kansas" ]
            }
        }
    }
}
siuc-nate commented 1 month ago

The query from the initial post now runs in about 6 seconds.

jeannekitchens commented 3 weeks ago

@siuc-nate @excelsior 6 seconds is still slow @excelsior @edgarf where are we with have improved, overall performance?

excelsior commented 1 day ago

@jeannekitchens The latest optimization idea I implemented sped up slow queries 2.5–5 times (except the 4.5s one mentioned above—it hasn't changed). Unfortunately, further exploiting this approach causes regressions of some of the other queries, so I'll have to find another way of improving the engine's performance.

jeannekitchens commented 22 hours ago

@excelsior should we close this issue and open one that is more about overall performance?