CredentialEngine / Schema-Development

Development of the vocabularies for the CTI models
14 stars 8 forks source link

Schema Manager Updates #800

Closed siuc-nate closed 2 weeks ago

siuc-nate commented 2 years ago

This is to consolidate the various other issues regarding limitations of the schema management system, and track any new ones that come up.

General:


Content management enhancements:


Updates to the schema.org mapping for Embeddable Credentials (EOCreds):


Other Ideas

stuartasutton commented 2 years ago

@siuc-nate, you state with regards to the need for "cataloguing guidance/best practices" like the best practices with DataOne:

"May be somewhat redundant with the information in the handbooks (or it may replace some of that information)"

I don't agree. Nowhere does the handbook get down in the weeds with best practices. The Hand Book and a Best Practices Guide are complementary. Look at the 10 entries here: https://github.com/CredentialEngine/CatalogingGuidance. Check out Issue 6.

The Handbook should not be filled with such advice. The Best Practices Guide could/should be populated with advice in all areas of creating data and picking properties where discretion is exercised.

stuartasutton commented 2 years ago

@siuc-nate, you state with regards to the need for a means to "comment" on each term table (could also be in the release history for terms in pending):

"Would need some basic anti-spam prevention and some kind of a display page"

As for the display page, there would be no need for any sort of public display. Comments that lead to issues can be raised in github issues by those monitoring incoming comments.

We currently have no simple, single, easily discovered, open mechanism for commenting on terms that aren't constraining for some (e.g., github) or easily found anywhere near the points in terms and other documentation where comments might arise. I've been around this project for a while, and I don't know where comments on terms should be registered except on Github. Perhaps I SHOULD know and don't...so consider me the canary in the coal mine that others haven't got a clue either.

siuc-nate commented 2 years ago

@stuartasutton

I don't agree. Nowhere does the handbook get down in the weeds with best practices. The Hand Book and a Best Practices Guide are complementary. Look at the 10 entries here: https://github.com/CredentialEngine/CatalogingGuidance. Check out Issue 6.

Got it, updated the original post

stuartasutton commented 2 years ago

As for #638 Enable inverse property declarations, the issue is not so much whether we can include the inverseOf property in terms of CTDL declarations, but rather how those declarations are handled by the Registry. Does the registry automatically create the inverse data when it encounters actual data for an inverseOf property; e.g.

Schema declaration: "husband" inverseOf "wife" Data in DB: "Shakespeare wife Hathaway" Query 1: "Who is Shakespeare's wife?" Query 2: "Who is Hathaway's husband?" (inverse)

Will we be able to do Query 2?

siuc-nate commented 2 years ago

It does not, mostly because the data comes from many sources and our policy requires primary-source (directly or by proxy) information. Allowing automatic inverse connections would:

We can still accommodate your queries though, since the search API enables crawling connections in reverse:

//Given this data
(person:Shakespeare)-[hasWife]->(person:Hathaway)
//Find Shakespeare's wife 
//(literally: return all objects where the "hasWife" connection originates from "person:Shakespeare")
{
  "^hasWife": {
    "@id": "person:Shakespeare"
  }
}

//Find Hathaway's husband 
//(literally: return all objects where "hasWife" references "person:Hathaway")
{
  "hasWife": {
    "@id": "person:Hathaway"
  }
}
stuartasutton commented 2 years ago

Nate, but let's be clear, you are talking about Registry policy. Declaring inverse properties in CTDL has no such policy constraints.

As for the API solving the problem, I don't think your result does what the following intends:

{
  "hasHusband": {
    "@id": "person:Hathway"
   }
}

In other words, there is no Hathaway hasHusband Shakespeare triple directly added to the database when the triple `Shakespeare hasWife Hathaway' is added (as could be in a triplestore) or handled at the time of query. While the inverse can be inferred by humans from your result, there's nothing definitive. There is no policy constraint on Shakespeare asserting that Hathaway married him.

siuc-nate commented 2 years ago

Nate, but let's be clear, you are talking about Registry policy. Declaring inverse properties in CTDL has no such policy constraints.

True, but your question was specifically about the registry:

Does the registry automatically create the inverse data when it encounters actual data for an inverseOf property; e.g. [...]


In other words, there is no Hathaway hasHusband Shakespeare triple directly added to the database (as could be in a triplestore) or handled at the time of query.

Correct, because none was asserted in the source/first-party data.

In someone else's implementation that doesn't care about that, they could turn on the automatic inverse calculations and have the generated hasHusband property; but again, you asked about the Registry. In any event, you're right that CTDL doesn't disallow inverse properties, but the schema manager doesn't currently have a means of supporting them (hence the bullet point in the original post above).

stuartasutton commented 2 years ago

@siuc-nate, there is a difference between being able to handle inverse properties and having a policy that says "no". I think we are on the same page there. But there is a layer below that in terms of the Registry where we can do it but have a blanket policy (at the moment) that says we don't.

philbarker commented 2 years ago

I think I am just reiterating what Stuart has said (and I maybe what Nate knows), but it's important to remember that declaring hasHusband as an inverse of hasWife does not mean that you have to add a complementary hasWife property everytime someone adds a hasHusband property, in fact it means that you do not have to add one (if you're willing to trust inferences).

Declaring inverse properties would embed the potential of reverse searches into CTDL. It's actually no different to other term-to-term or concept-to-concept relationships like rdfs:subPropertyOf rdfs:subClassOf owl:equivalentClass owl:equivalentProperty skos:exactMatch skos:narrower and so on: it's just another relationship that may be used when broadening searches to included results inferred from the schema rather than directly asserted in the data.

siuc-nate commented 2 years ago

@stuartasutton Yes, I think we're saying the same thing.

@philbarker Whether the data is actually present, or appears to be present because it's inferenced, the result is the same (at least as far as the search API goes) - it will appear that there are inverse connections where none exist in the real data. That leads to a (QA Org)-[accredits]->(credential) appearing to be there any time a credential (perhaps incorrectly) asserts (credential)-[accreditedBy]->(QA Org). We don't want false/unconfirmed inverse assertions to be part of the data set, inferenced or not.

philbarker commented 2 years ago

@siuc-nate

@philbarker Whether the data is actually present, or appears to be present because it's inferenced, the result is the same (at least as far as the search API goes) - it will appear that there are inverse connections where none exist in the real data. That leads to a (QA Org)-[accredits]->(credential) appearing to be there any time a credential (perhaps incorrectly) asserts (credential)-[accreditedBy]->(QA Org).

That would only happen if you choose (or chose) to implement the search API that way. If it's not the behavior you want why would you choose to do it?

siuc-nate commented 2 years ago

We wouldn't; that's the point I was making.

philbarker commented 2 years ago

So I don't see the problem. It's a change to CTDL that won't affect the registry.

philbarker commented 2 years ago

I think it was a mistake to merge the discussion of over a dozen different issues into one thread. Projects and tags would be a better way of keeping track of several issues that relate to the same component.

siuc-nate commented 2 years ago

It's a change to CTDL that won't affect the registry.

Agreed. But Stuart asked me about the registry, which is why it came up.

I think it was a mistake to merge the discussion of over a dozen different issues into one thread.

I disagree. The majority of these had only one post in their respective threads and are small enough items that I don't see a problem aggregating them together. Projects and tags mean we'd still have a dozen extra issues scattered throughout our issue list. We can reopen a closed issue if it becomes significant enough.

siuc-nate commented 2 weeks ago

Archiving this to reduce clutter, it will still be a very useful checklist of things for the new schema manager to be able to do, once I have time to resume working on it.