CredentialEngine / CredentialRegistry

Repository for development of the Credential Registry
Apache License 2.0
12 stars 10 forks source link

Query Exceptions: Supporting queries that exclude classes #693

Closed jeannekitchens closed 2 months ago

jeannekitchens commented 6 months ago

Is this use case already supported? If not, what is a solution?

Use Case: There’s numerous types of credentials https://credreg.net/page/typeslist#ceterms_Credential. Typically a user wants to download all of the credentials for people. There’s a QA credential that is NOT for people. Need to be able to use the query below with the exception of QACredential. Otherwise, users risk not getting all credentials for people as new types of credentials are added. As it is we have an issue here with anyone using a query listing out types of credentials available at a specific point in time. E.g., numerous types of certificates were added in late 2023.

{ "@type": { "search:value": "ceterms:Credential", "search:matchType": "search:subClassOf" } }

siuc-nate commented 6 months ago

More generally, it would be nice if queries supported some kind of exception or "not match" feature. I can think of a few ways that might work, but we'd need to carefully think through the side-effects of each approach.

Some ideas:


Approach 1: use of a value object with a new term, search:notValue This would entail creating a value object (see below) for anything that needs to touch on inversion, and extending its functionality as needed.

Approach 1, Case 1: All credential types except QA Credentials

{
  "@type": {
    "search:value": "ceterms:Credential",
    "search:matchType": "search:subClassOf",
    "search:notValue": "ceterms:QACredential"
  }
}

Approach 1, Case 2: Coding certificates (except for medical coding)

{
  "@type": "ceterms:Certificate",
  "ceterms:name": {
    "search:value": "coding",
    "search:matchType": "search:contains",
    "search:notValue": "medical"
  }
}

Approach 1, Case 3: Degrees offered by organizations that are anywhere except for Illinois or Kansas

{
  "@type": "ceterms:Degree",
  "ceterms:offeredBy": {
    "ceterms:address": {
      "search:addressRegion": {
        "search:value": "search:anyValue",
        "search:notValue": [ "IL", "Illinois", "KS", "Kansas" ]
      }
    }
  }
}

Approach 1, Case 4: Organizations that own certificates for programming in Illinois, but not counting it as a match if the certificate is for television programming in Chicago (it's ambiguous as to whether an organization that had both should/would be included in the results or not. Perhaps an additional term like "ignoreValue" could be used to allow explicit exclusion...).

{
  "@type": "ceterms:Organization",
  "^ceasn:ownedBy": {
    "search:value": {
      "@type": "ceterms:Certificate",
      "ceterms:name": "programming",
      "ceterms:availableAt": {
        "ceterms:addressRegion": [ "IL", "Illinois" ]
      }
    },
    "search:notValue": {
      "@type": "ceterms:Certificate",
      "ceterms:name": "television",
      "ceterms:availableAt": {
        "ceterms:addressLocality": [ "Chicago" ]
      }
    }
  }
}

Approach 1, Case 5: Double inversion: Give me all credentials except for degrees unless the degree happens to be an associate degree in which case include it (the nesting here hopefully avoids ambiguity, but would require some kind of cascading logic to ensure the deeper double-negative overrides the higher-level single negative):

{
  "@type": {
    "search:value": "ceterms:Credential",
    "search:matchType": "search:subClassOf",
    "search:notValue": {
      "search:value": "ceterms:Degree",
      "search:matchType": "search:subClassOf",
      "search:notValue": {
        "search:value": "ceterms:AssociateDegree"
      }
    }
  }
}

Approach 1, Case 6: Multiple potential matches with different exceptions in some of them

{
  "@type": "ceterms:MicroCredential",
  "ceterms:ownedBy": [
    {
      "ceterms:name": {
        "search:value": "search:anyValue",
        "search:notValue": "don't want this"
      },
      "ceterms:address": {
        "ceterms:addressRegion": [ "IL", "Illinois" ]
      }
    },
    {
      "ceterms:name": "search:anyValue",
      "ceterms:address": {
        "ceterms:addressRegion": [ "KS", "Kansas" ]
      }
    }
    {
      "ceterms:name": {
        "search:value": "search:anyValue",
        "search:notValue": "don't want that"
      },
      "ceterms:address": {
        "ceterms:addressRegion": [ "IN", "Indiana" ]
      }
    }
  ]
}

Approach 2: Prefixing (nearly) any term with ! to indicate that it should be treated as a negative. There are a few search terms (like termGroup) where this wouldn't make sense, and how it interacts with ^ probably needs more thought. An interesting twist is that the ! makes the property a different string, so the same property can show up twice in one object without breaking JSON - though this is probably a double-edged sword.

Approach 2, Case 1: All credential types except QA Credentials

{
  "@type": {
    "search:value": "ceterms:Credential",
    "search:matchType": "search:subClassOf",
  },
  "!@type": "ceterms:QACredential"
}

Approach 2, Case 2: Coding certificates (except for medical coding).

{
  "@type": "ceterms:Certificate",
  "ceterms:name": "coding",
  "!ceterms:name": "medical"
}

Approach 2, Case 3: Degrees offered by organizations that are anywhere except for Illinois or Kansas

{
  "@type": "ceterms:Degree",
  "ceterms:offeredBy": {
    "ceterms:address": {
      "search:addressRegion": "search:anyValue",
      "!search:addressRegion": [ "IL", "Illinois", "KS", "Kansas" ]
    }
  }
}

Approach 2, Case 4: Organizations that own certificates for programming in Illinois, but not counting it as a match if the certificate is for television programming in Chicago. It's still a little ambiguous as to whether an organization that meets both criteria should/would be in the result set. This is also here to show reverse traversal ^ and inversion ! in use on the same term (I don't currently think we should negate reverse traversal, so the two don't interact and could appear in either order - but this deserves some more though too).

{
  "@type": "ceterms:Organization",
  "^ceasn:ownedBy": {
    "@type": "ceterms:Certificate",
    "ceterms:name": "programming",
    "ceterms:availableAt": {
      "ceterms:addressRegion": [ "IL", "Illinois" ]
    }
  },
  "!^ceasn:ownedBy": {
    "@type": "ceterms:Certificate",
    "ceterms:name": "television",
    "ceterms:availableAt": {
      "ceterms:addressLocality": [ "Chicago" ]
    }
  }
}

Approach 2, Case 5: Double inversion. I could think of two ways to do this - depending on if/how cascading/overriding works, one or the other (or both) might yield the correct/same result set:

{
  "@type": [
    {
      "search:value": "ceterms:Credential",
      "search:matchType": "search:subClassOf",
    },
    "ceterms:AssociateDegree"
  ],
  "!@type": {
    "search:value": "ceterms:Degree",
    "search:matchType": "search:subClassOf"
  }
}

{
  "@type": {
    "search:value": "ceterms:Credential",
    "search:matchType": "search:subClassOf",
    "!search:value": {
      "search:value": "ceterms:Degree",
      "search:matchType": "search:subClassOf",
      "!search:value": {
        "search:value": "ceterms:AssociateDegree"
      }
    }
  }
}

Approach 2, Case 6: Multiple potential matches with different exceptions in some of them. I wonder if it would be better to assume an implicit search:anyValue except for ... when only the negated property appears?

{
  "@type": "ceterms:MicroCredential",
  "ceterms:ownedBy": [
    {
      "ceterms:name": "search:anyValue",
      "!ceterms:name": "don't want this",
      "ceterms:address": {
        "ceterms:addressRegion": [ "IL", "Illinois" ]
      }
    },
    {
      "ceterms:name": "search:anyValue",
      "ceterms:address": {
        "ceterms:addressRegion": [ "KS", "Kansas" ]
      }
    },
    {
      "ceterms:name": "search:anyValue",
      "!ceterms:name": "don't want that",
      "ceterms:address": {
        "ceterms:addressRegion": [ "IN", "Indiana" ]
      }
    }
  ]
}

Approach 3: Term groups and search:notTerms. The idea of "search:operator" : "search:notTerms" existed at some point a long time ago (I think back when we were using SPARQL), but it seems to have fallen off the table. Maybe for the best, given how unwieldy some of these get:

Approach 3, Case 1: All credential types except QA Credentials

{
  "@type": {
    "search:value": "ceterms:Credential",
    "search:matchType": "search:subClassOf"
  },
  "search:termGroup": {
    "search:operator": "search:notTerms",
    "@type": "ceterms:QACredential
  }
}

Approach 3, Case 2: Coding certificates (except for medical coding).

{
  "@type": "ceterms:Certificate",
  "ceterms:name": "coding",
  "search:termGroup": {
    "search:operator": "search:notTerms",
    "ceterms:name": "medical"
  }
}

Approach 3, Case 3: Degrees offered by organizations that are anywhere except for Illinois or Kansas. This one begins to show how this approach can be painful to construct. It requires a term group to enable two instances of addressRegion, which means using an array, but the items in the array need to be ANDed together (hence the andTerms), and then within those items, one of them gets inverted...

{
  "@type": "ceterms:Degree",
  "ceterms:offeredBy": {
    "ceterms:address": {
      "search:termGroup": {
        "search:value": [
          {
            "ceterms:addressRegion": "search:anyValue"
          },
          {
            "search:operator": "search:notTerms",
            "ceterms:addressRegion": [ "IL", "Illinois", "KS", "Kansas" ]
          }
        ],
        "search:operator": "search:andTerms"
      }
    }
  }
}

Approach 3, Case 4: Organizations that own certificates for programming in Illinois, but not counting it as a match if the certificate is for television programming in Chicago.

{
  "@type": "ceterms:Organization",
  "^ceasn:ownedBy": {
    "search:termGroup": {
      "search:value": [
        {
          "@type": "ceterms:Certificate",
          "ceterms:name": "programming",
          "ceterms:availableAt": {
            "ceterms:addressRegion": [ "IL", "Illinois" ]
          }
        },
        {
          "search:operator": "search:notTerms",
          "@type": "ceterms:Certificate",
          "ceterms:name": "television",
          "ceterms:availableAt": {
            "ceterms:addressLocality": [ "Chicago" ]
          }
        }
      ],
      "search:operator": "search:andTerms"
    }

  }
}

Approach 3, Case 5: Double inversion. This one could also support a more nested alternative, but my brain hurts thinking about that.

{
  "@type": {
    "search:termGroup": {
      "search:value": [
        {
          "search:value": "ceterms:Credential",
          "search:matchType": "search:subClassOf"
        },
        {
          "search:operator": "search:andTerms",
          "search:value": "ceterms:Degree",
          "search:matchType": "search:subClassOf"
        },
        {
          "search:value": "ceterms:AssociateDegree"
        ]
      ],
      "search:operator": "search:andTerms"
    }
  }
}

Approach 3, Case 6: Multiple potential matches with different exceptions in some of them. Term Group was probably a mistake.

{
  "@type": "ceterms:MicroCredential",
  "ceterms:ownedBy": [
    {
      "search:termGroup": {
        "search:value": [
          {
            "search:name": "search:anyValue"
          },
          {
            "search:operator": "search:notTerms",
            "search:name": "don't want this",
          }
        ],
        "search:operator": "search:andTerms"
      },
      "ceterms:address": {
        "ceterms:addressRegion": [ "IL", "Illinois" ]
      }
    },
    {
      "ceterms:name": "search:anyValue",
      "ceterms:address": {
        "ceterms:addressRegion": [ "KS", "Kansas" ]
      }
    },
    {
      "search:termGroup": {
        "search:value": [
          {
            "search:name": "search:anyValue"
          },
          {
            "search:operator": "search:notTerms",
            "search:name": "don't want that",
          }
        ],
        "search:operator": "search:andTerms"
      },
      "ceterms:address": {
        "ceterms:addressRegion": [ "IN", "Indiana" ]
      }
    }
  ]
}

Approach 4: Exception Query. This approach basically involves sending two separate queries at the API level, one of which works like normal, and the other of which is entirely dedicated to removing things from the result set of the first query. It's pretty clean, and it mostly works...mostly. More on that later.

Approach 4, Case 1: All credential types except QA Credentials

{
  "Query": {
    "@type": {
      "search:value": "ceterms:Credential",
      "search:matchType": "search:subClassOf"
    }
  },
  "ExceptionQuery": {
    "@type": "ceterms:QACredential"
  }
}

Approach 4, Case 2: Coding certificates (except for medical coding).

{
  "Query": {
    "@type": "ceterms:Certificate",
    "ceterms:name": "coding"
  },
  "ExceptionQuery": {
    "ceterms:name": "medical"
  }
}

Approach 4, Case 3: Degrees offered by organizations that are anywhere except for Illinois or Kansas.

{
  "Query": {
    "@type": "ceterms:Degree",
    "ceterms:offeredBy": {
      "ceterms:address": {
        "ceterms:addressRegion": "search:anyValue"
      }
    }
  },
  "ExceptionQuery": {
    "ceterms:offeredBy": {
      "ceterms:address": {
        "ceterms:addressRegion": [ "IL", "Illinois", "KS", "Kansas" ]
      }
    }
  }
}

Approach 5, Case 4: Organizations that own certificates for programming in Illinois, but not counting it as a match if the certificate is for television programming in Chicago.

{
  "Query": {
    "@type": "ceterms:Degree",
    "^ceterms:ownedBy": {
      "@type": "ceterms:Certificate",
      "ceterms:name": "programming",
      "ceterms:availableAt": {
        "ceterms:addressRegion": [ "IL", "Illinois" ]
      }
    }
  },
  "ExceptionQuery": {
    "^ceterms:ownedBy": {
      "@type": "ceterms:Certificate",
      "ceterms:name": "television",
      "ceterms:availableAt": {
        "ceterms:addressLocality": [ "Chicago" ]
      }
    }
  }
}

Approach 5, Case 5: Double inversion. This approach would not allow nesting, so multiple layers of inversion could get tricky to figure out.

{
  "Query": {
    "@type": [
      {
        "search:value": "ceterms:Credential",
        "search:matchType": "search:subClassOf"
      },
      "ceterms:AssociateDegree"
    ]
  },
  "ExceptionQuery": {
    "@type": {
      "search:value": "ceterms:Degree",
      "search:matchType": "search:subClassOf"
    }
  }
}

Approach 5, Case 6: Multiple potential matches with different exceptions in some of them. I contrived this example specifically to break this approach, because I wanted to figure out its weaknesses. Here, because the exceptions have been completely decontextualized from their relevant places, there is no way to know which exception goes with which item in the array. This could potentially be worked around by ensuring the items are in the same order and passing empty/placeholder objects in the exception query, I suppose, but that might still break down in an even more complex case.

{
  "Query": {
    "@type": "ceterms:MicroCredential",
    "ceterms:ownedBy": [
      {
        "ceterms:name": "search:anyValue",
        "ceterms:address": {
          "ceterms:addressRegion": [ "IL", "Illinois" ]
        }
      },
      {
        "ceterms:name": "search:anyValue",
        "ceterms:address": {
          "ceterms:addressRegion": [ "KS", "Kansas" ]
        }
      },
      {
        "ceterms:name": "search:anyValue",
        "ceterms:address": {
          "ceterms:addressRegion": [ "IN", "Indiana" ]
        }
      }
    ]
  },
  "ExceptionQuery": {
    "ceterms:ownedBy": [
      {
        "!ceterms:name": "don't want this",
      },
      {
        "!ceterms:name": "don't want that",
      }
    ]
  }
}
excelsior commented 6 months ago

@siuc-nate Thanks for the ideas! I personally like the second approach the most, mostly because it offers the cleanest syntax of them all. The last two look two cumbersome IMO. And yes, the negated reverse connections may be tricky, but I think we should give it a shot.

jeannekitchens commented 6 months ago

@siuc-nate thank you for the thorough evaluation! @excelsior thank you for giving this a shot. I also put this on our next meeting agenda to discuss further. What is the downside?

siuc-nate commented 6 months ago

The things I ran into with the second approach while coming up with those examples are below.

The first issue is how (if at all) ! interacts with ^ when both are present. I don't think it should turn a reverse traversal into a forward traversal, as that wouldn't make sense in a lot of scenarios. Instead it makes more sense to treat the two independently, i.e. "where something does not have a reverse connection to something else..."

In other words:

{
  "@type": "ceterms:Certificate"
  "!^ceterms:offers": {
    "ceterms:name": "abc"
  }
}

means "any certificate that does not have a reverse offers connection to an organization with abc in its name".

It does not mean "any certificate that ceterms:offers an organization with abc in its name" because that doesn't make sense schema-wise.

It also does not mean "any certificate that is ceterms:offeredBy an organization with abc in its name" because:

The second issue is whether or not someone needs to explicitly provide the property with search:anyValue when they are asking for "something that has any value for this property except for whatever value I have provided".

In other words, are these two queries exactly equivalent?:

{
  "ceterms:name": "search:anyValue",
  "!ceterms:name": "some text"
}
{
  "!ceterms:name": "some text"
}

I can't see a reason why they wouldn't be considered exactly equivalent, but I may be missing something.

I suppose a similarly-themed question would be whether these two are exactly equivalent:

{
  "ceterms:requires": "search:noValue"
}
{
  "!ceterms:requires": "search:anyValue"
}

The third issue wasn't exclusive to the second approach. It had to do with multiple/nested exceptions, e.g. "I want all of x except for y except for z" where the desire is to receive x and z but not y (unless the y is a z).

Example: Suppose I want everything with an address in the US, except for addresses in Illinois and Indiana (unless those addresses are in Chicago or Indianapolis, in which case, include them). This is admittedly a contrived example, but hopefully it illustrates what I'm getting at:

{
  "ceterms:address": {
    "ceterms:addressCountry": "USA",
    "!ceterms:addressRegion": {
      "search:value": [ "IL", "IN" ],
      "!search:value": {
        "ceterms:addressLocality": [ "Chicago", "Indianapolis" ]
      }
    }
  }
}

I don't know if that's something we would need (or want) to support, but I imagine it may come up eventually, so it's worth thinking about.

The other example of this, which is maybe a little more realistic, is the one I gave in my previous post: I want all credentials, except for degrees, unless the degree is an Associate Degree, in which case include it. This one also mixes in the complexity of the "search:matchType": "search:subClassOf" logic. I can think of two potential structures for this - one nested, one not - but I'm not sure if the two queries would (or should) really be equivalent:

{
  "@type": [
    {
      "search:value": "ceterms:Credential",
      "search:matchType": "search:subClassOf",
    },
    "ceterms:AssociateDegree"
  ],
  "!@type": {
    "search:value": "ceterms:Degree",
    "search:matchType": "search:subClassOf"
  }
}
{
  "@type": {
    "search:value": "ceterms:Credential",
    "search:matchType": "search:subClassOf",
    "!search:value": {
      "search:value": "ceterms:Degree",
      "search:matchType": "search:subClassOf",
      "!search:value": {
        "search:value": "ceterms:AssociateDegree"
      }
    }
  }
}

There are probably other things worth thinking through, here. We should look through some of the other complex queries we've had before and see what would happen to them if different parts were negated.

One more, just for fun: What would this return?

{
  "!ceterms:requires": {
    "!ceterms:targetAssessment": "search:noValue"
  }
}
excelsior commented 4 months ago

@siuc-nate I implemented it using the second approach, i.e. supporting the ! prefix. If present it shoud be the firstmost. E.g. "!^ceterms:ownedBy" is correct, but "^!ceterms:ownedBy"isn't.

The feature is avalable on sandbox.

siuc-nate commented 4 months ago

Thanks. I will take a look when time allows.

siuc-nate commented 3 months ago

Looks like this is working, thanks.