Schema Manager Updates - Githubissues

siuc-nate commented 2 years ago

This is to consolidate the various other issues regarding limitations of the schema management system, and track any new ones that come up.

General:

Editor improvements
- Determine improvements to the schema editor based on everything I've learned since it was originally built
- Better handling/interfaces for large-scale changes (e.g. large changes to domains/ranges)
- Better integration for history tracking (some means of automatically generating history based on changes)
- Better import/export tools
- Some mechanism that keeps me from needing to manually change all of the "pending" items to "stable"/"unstable" at release time/prior to creating a new version
Infrastructure improvements
- Remove dependence on Neo4J (Find another lightweight database solution instead)
- Better generation of context files
- Better generation of schema files
- More robust handling of cross-schema connections (these work now, but weren't part of the original design, so the implementation is somewhat fragile)
- Determine whether there's a better way to handle schema versioning internally
- Better handling for property grouping internally
- Better handling for a pending release/pending changes (maybe stored as changes to the current schema rather than a separate copy of the entire schema? That would allow for terms to show up under pending even if the term itself already exists as stable)
#749 Enable getting the schema serialization without the concept schemes/concepts
- It should be possible to achieve this with an additional download option on the schema page
#763 Enable export of SHACL for the schema/policy
Support the use of meta:TermStatusType throughout the schema manager

Issues affecting the schema directly:
#638 Enable inverse property declarations
- Essentially this would just be a new field in the system
- Shouldn't be a problem for any existing systems
#675 (Comment) Enable skos:broadMatch/skos:narrowMatch for concepts
- Two new fields in the system
- Shouldn't be a problem for existing systems
#537 Enable better history/change tracking
- This would correct some issues in the graph for history tracking; namely not being able to link to certain objects
#807 Enable skos:relatedMatch
Accommodate the terms/relationships to external schemas identified in this document
Enable annotating a borrowed term using skos:scopeNote

Content management enhancements:

#562 Enable public comments on term tables
- Would need some basic anti-spam prevention and some kind of a display page
#584 Create a means of cataloguing guidance/best practices using a schema such as Data One

Updates to the schema.org mapping for Embeddable Credentials (EOCreds):

#681 Update mapping for ceterms:identifier
- Need to make sure this still makes sense given updates to identifier that have happened since that issue was created
#686 Update mapping for credential type
#687 Update mapping for @type
#688 Update mapping for cost profile

Other Ideas

Automated search of other data sources (e.g. wikipedia) for matches to term data

stuartasutton commented 2 years ago

@siuc-nate, you state with regards to the need for "cataloguing guidance/best practices" like the best practices with DataOne:

"May be somewhat redundant with the information in the handbooks (or it may replace some of that information)"

I don't agree. Nowhere does the handbook get down in the weeds with best practices. The Hand Book and a Best Practices Guide are complementary. Look at the 10 entries here: https://github.com/CredentialEngine/CatalogingGuidance. Check out Issue 6.

The Handbook should not be filled with such advice. The Best Practices Guide could/should be populated with advice in all areas of creating data and picking properties where discretion is exercised.

stuartasutton commented 2 years ago

@siuc-nate, you state with regards to the need for a means to "comment" on each term table (could also be in the release history for terms in pending):

"Would need some basic anti-spam prevention and some kind of a display page"

As for the display page, there would be no need for any sort of public display. Comments that lead to issues can be raised in github issues by those monitoring incoming comments.

We currently have no simple, single, easily discovered, open mechanism for commenting on terms that aren't constraining for some (e.g., github) or easily found anywhere near the points in terms and other documentation where comments might arise. I've been around this project for a while, and I don't know where comments on terms should be registered except on Github. Perhaps I SHOULD know and don't...so consider me the canary in the coal mine that others haven't got a clue either.

siuc-nate commented 2 years ago

@stuartasutton

I don't agree. Nowhere does the handbook get down in the weeds with best practices. The Hand Book and a Best Practices Guide are complementary. Look at the 10 entries here: https://github.com/CredentialEngine/CatalogingGuidance. Check out Issue 6.

Got it, updated the original post

stuartasutton commented 2 years ago

As for #638 Enable inverse property declarations, the issue is not so much whether we can include the inverseOf property in terms of CTDL declarations, but rather how those declarations are handled by the Registry. Does the registry automatically create the inverse data when it encounters actual data for an inverseOf property; e.g.

Schema declaration: "husband" inverseOf "wife" Data in DB: "Shakespeare wife Hathaway" Query 1: "Who is Shakespeare's wife?" Query 2: "Who is Hathaway's husband?" (inverse)

Will we be able to do Query 2?

siuc-nate commented 2 years ago

It does not, mostly because the data comes from many sources and our policy requires primary-source (directly or by proxy) information. Allowing automatic inverse connections would:

Create extra overhead in terms of figuring out what all of the connections should be when data is published, updated, and/or removed
Violate the record signatures in the registry
Require records to contain/searches to operate on assertions that were not made by the publishers (who would be responsible for those? Credential Engine?)
Allow for false data (e.g. a credential incorrectly claims (credential)-[accreditedBy]->(some org) would auto-generate a (some org)-[accredits]->(credential) connection)
Probably lead to other unforeseen consequences

We can still accommodate your queries though, since the search API enables crawling connections in reverse:

//Given this data
(person:Shakespeare)-[hasWife]->(person:Hathaway)

//Find Shakespeare's wife 
//(literally: return all objects where the "hasWife" connection originates from "person:Shakespeare")
{
  "^hasWife": {
    "@id": "person:Shakespeare"
  }
}

//Find Hathaway's husband 
//(literally: return all objects where "hasWife" references "person:Hathaway")
{
  "hasWife": {
    "@id": "person:Hathaway"
  }
}

stuartasutton commented 2 years ago

Nate, but let's be clear, you are talking about Registry policy. Declaring inverse properties in CTDL has no such policy constraints.

As for the API solving the problem, I don't think your result does what the following intends:

{
  "hasHusband": {
    "@id": "person:Hathway"
   }
}

In other words, there is no Hathaway hasHusband Shakespeare triple directly added to the database when the triple `Shakespeare hasWife Hathaway' is added (as could be in a triplestore) or handled at the time of query. While the inverse can be inferred by humans from your result, there's nothing definitive. There is no policy constraint on Shakespeare asserting that Hathaway married him.

siuc-nate commented 2 years ago

Nate, but let's be clear, you are talking about Registry policy. Declaring inverse properties in CTDL has no such policy constraints.

True, but your question was specifically about the registry:

Does the registry automatically create the inverse data when it encounters actual data for an inverseOf property; e.g. [...]

In other words, there is no Hathaway hasHusband Shakespeare triple directly added to the database (as could be in a triplestore) or handled at the time of query.

Correct, because none was asserted in the source/first-party data.

In someone else's implementation that doesn't care about that, they could turn on the automatic inverse calculations and have the generated hasHusband property; but again, you asked about the Registry. In any event, you're right that CTDL doesn't disallow inverse properties, but the schema manager doesn't currently have a means of supporting them (hence the bullet point in the original post above).

stuartasutton commented 2 years ago

@siuc-nate, there is a difference between being able to handle inverse properties and having a policy that says "no". I think we are on the same page there. But there is a layer below that in terms of the Registry where we can do it but have a blanket policy (at the moment) that says we don't.

philbarker commented 2 years ago

I think I am just reiterating what Stuart has said (and I maybe what Nate knows), but it's important to remember that declaring hasHusband as an inverse of hasWife does not mean that you have to add a complementary hasWife property everytime someone adds a hasHusband property, in fact it means that you do not have to add one (if you're willing to trust inferences).

Declaring inverse properties would embed the potential of reverse searches into CTDL. It's actually no different to other term-to-term or concept-to-concept relationships like rdfs:subPropertyOf rdfs:subClassOf owl:equivalentClass owl:equivalentProperty skos:exactMatch skos:narrower and so on: it's just another relationship that may be used when broadening searches to included results inferred from the schema rather than directly asserted in the data.

siuc-nate commented 2 years ago

@stuartasutton Yes, I think we're saying the same thing.

@philbarker Whether the data is actually present, or appears to be present because it's inferenced, the result is the same (at least as far as the search API goes) - it will appear that there are inverse connections where none exist in the real data. That leads to a (QA Org)-[accredits]->(credential) appearing to be there any time a credential (perhaps incorrectly) asserts (credential)-[accreditedBy]->(QA Org). We don't want false/unconfirmed inverse assertions to be part of the data set, inferenced or not.

philbarker commented 2 years ago

@siuc-nate

@philbarker Whether the data is actually present, or appears to be present because it's inferenced, the result is the same (at least as far as the search API goes) - it will appear that there are inverse connections where none exist in the real data. That leads to a (QA Org)-[accredits]->(credential) appearing to be there any time a credential (perhaps incorrectly) asserts (credential)-[accreditedBy]->(QA Org).

That would only happen if you choose (or chose) to implement the search API that way. If it's not the behavior you want why would you choose to do it?

siuc-nate commented 2 years ago

We wouldn't; that's the point I was making.

philbarker commented 2 years ago

So I don't see the problem. It's a change to CTDL that won't affect the registry.

philbarker commented 2 years ago

I think it was a mistake to merge the discussion of over a dozen different issues into one thread. Projects and tags would be a better way of keeping track of several issues that relate to the same component.

siuc-nate commented 2 years ago

It's a change to CTDL that won't affect the registry.

Agreed. But Stuart asked me about the registry, which is why it came up.

I think it was a mistake to merge the discussion of over a dozen different issues into one thread.

I disagree. The majority of these had only one post in their respective threads and are small enough items that I don't see a problem aggregating them together. Projects and tags mean we'd still have a dozen extra issues scattered throughout our issue list. We can reopen a closed issue if it becomes significant enough.

siuc-nate commented 2 weeks ago

Archiving this to reduce clutter, it will still be a very useful checklist of things for the new schema manager to be able to do, once I have time to resume working on it.

CredentialEngine / Schema-Development

Schema Manager Updates #800

Support the use of `meta:TermStatusType` throughout the schema manager

CredentialEngine / Schema-Development

Schema Manager Updates #800

Support the use of meta:TermStatusType throughout the schema manager

Support the use of `meta:TermStatusType` throughout the schema manager