cncf / landscape-graph

CNCF Landscape Graph, data model, and applications.
https://github.com/orgs/cncf/projects/7/views/6
Other
39 stars 11 forks source link

design: Sub-Graph Modules (sgm) #54

Closed halcyondude closed 2 years ago

halcyondude commented 2 years ago

Sub-Graph Modules

Goals

Tasks

Types of Sub-Graph Modules (SGM)

Each of these is an Interface, acting as a base class with shared properties. Reasons to structure in this way include:

  1. enables treating classes of things polymorphically while leaving concrete instances' portion of state undisturbed.
  2. lowers the barrier to entry for new contributions
  3. provide blast radii for the model as a whole
  4. facilitate pruning and cardinality reduction of test surface requisite to validate changes in CI. As even casual data sets have the potential to be non-trivial in size, and potential cost, an intentional & structured approach is warranted.
base types derived types
blogs CNCF, thenewstack, medium.*, LinkedIn Posts, ...
boards GH Discuss, StackOverflow
corp crunchbase, yahoofinance
email cncf project lists, k8s lists
packages brew, choco, crate, deb, deno, go, maven, npm, pip, rpm
rtc slack, discord, gitter
social twitter, linkedin
threats nist
learning youtube, books, online courses (public / open only!)

Each module shall have:

Taking this approach facilitates creation of a rich set of capabilities impacting model training, CI, and developer experience.

By using snapshots of the graph (Graph Projections TODO doc link) in a manner similar to virtual machine snapshot trees (esx, hyper-v, ...), CI can

We'll also benefit from a sustainable, portable, useable data model that is documented.

(TODO: update w/ final set)

.
├── blogs
│   └── sgm-blogcncf
├── boards
│   ├── sgm-ghdiscuss
│   └── sgm-stackoverflow
├── core
│   └── generated
├── corp
│   ├── sgm-crunchbase
│   └── sgm-yahoofinance
├── email
├── packages
│   ├── sgm-brew
│   ├── sgm-choco
│   ├── sgm-crate
│   ├── sgm-deb
│   ├── sgm-deno
│   ├── sgm-go
│   ├── sgm-maven
│   ├── sgm-npm
│   ├── sgm-pip
│   └── sgm-rpm
├── rtc
│   ├── sgm-discord
│   └── sgm-slack
├── social
│   ├── sgm-linkedin
│   └── sgm-twitter
├── threats
│   └── sgm-nist
└── learning
    └── sgm-youtube

ACTIVE DEVELOPMENT

Closely related to this issue is: https://github.com/cncf/landscape-graph/issues/4 (branch)

How GraphQL Interfaces Work

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_directive_inheritance

Any directives present on an interface or its fields will be "inherited" by any object types implementing it. For example, the type definitions above could be refactored to have the @relationship directive on the actors field in the Production interface instead of on each implementing type as it is currently:

interface Production {
    title: String!
    actors: [Actor!]! @relationship(type: "ACTED_IN", direction: IN, properties: "ActedIn")
}

type Movie implements Production {
    title: String!
    actors: [Actor!]!
    runtime: Int!
}

type Series implements Production {
    title: String!
    actors: [Actor!]!
    episodes: Int!
}

interface ActedIn @relationshipProperties {
    role: String!
}

type Actor {
    name: String!
    actedIn: [Production!]! @relationship(type: "ACTED_IN", direction: OUT, properties: "ActedIn")
}

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_overriding

In addition to inheritance, directives can be overridden on a per-implementation basis. Say you had an interface defining some Content, with some basic authorization rules:

interface Content
    @auth(rules: [{ operations: [CREATE, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]! @relationship(type: "HAS_CONTENT", direction: IN)
}

type User {
    username: String!
    content: [Content!]! @relationship(type: "HAS_CONTENT", direction: OUT)
}

type PublicContent implements Content {
    title: String!
    author: [Author!]!
}

type PrivateContent implements Content
    @auth(rules: [{ operations: [CREATE, READ, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]!
}

Core Data Model

core-png

jexp commented 2 years ago

That's why labels are like tags. You can add them on the fly and they are useful to tag things, group them or denote status etc.

So you don't have to create a complex ontology structure just tag your nodes with the labels that represent the roles they play.

AlexxNica commented 2 years ago

Hey there, @halcyondude! Congrats on the awesome work and research you're doing! I'm using a lot of your research to guide my own, which I started a while back for a Filecoin Plus project.

Through my research, I'm trying to follow similar concepts you've described, and I feel like the "Sub-Graph Modules" concept would benefit from the Apollo Federation architecture. Do you know about it already? If not, taking a look may be worth it!

Here are some starting points:

One thing, though, is that it seems Neo4j doesn't readily integrate with it, but it seems easy to make it do so. Here's a repository from the Apollo team that demonstrates it working with Neo4j: https://github.com/apollosolutions/neo4j-subgraph

halcyondude commented 2 years ago

Hi there! I had a look at Apollo Federation, and while it's pretty cool, I'm not sure it's the best fit here, most specifically because subscriptions are not supported. However much of the conceptual information on Schema composition is quite relevant and makes sense.

For now to keep things simple, and not have another layer of indirection, planning to use graphql fragments and a simple directory/manifest approach.

Another concern I had on using Apollo Federation is the impact on the client/query layer, as well as requiring a gateway, the supergraph/subgraph idiom and such to be part of the runtime. In the ideal case (I posit) the compositional model for the data layer is immaterial to the final data model without coupling.

halcyondude commented 2 years ago

https://compiledexperience.com/blog/posts/dynamic-graphql

halcyondude commented 2 years ago

image

AlexxNica commented 2 years ago

@halcyondude That's great! Are you planning on centralizing everything into a single database vendor (in this case, Neo4j for now)?

halcyondude commented 2 years ago

@halcyondude That's great! Are you planning on centralizing everything into a single database vendor (in this case, Neo4j for now)?

Using Neo4j for the time being, primarily due to existence of APOC and the GDS libraries, and free availability of Neo4j Desktop as a native experience across platforms, with a low[er] barrier to entry for new contributors. As the GraphQL javascript library does all the translation of GraphQL --> OpenCypher, really the standardization is on OpenCypher, which is implemented by other Graph Databases as well (https://opencypher.org/projects)

halcyondude commented 2 years ago

Recently published, apollo's stack just leveled up in terms of oss offerings...

https://www.apollographql.com/blog/announcement/backend/the-supergraph-a-new-way-to-think-about-graphql

https://www.apollographql.com/blog/announcement/backend/apollo-router-our-graphql-federation-runtime-in-rust

It's likely that after a more detailed review we'll move forward with some of these components.

halcyondude commented 2 years ago

Update, after findings in https://github.com/cncf/landscape-graph/issues/4

The the SGM's implemented (initially) are exposed to the supergraph via GraphQL endpoints. This provides strong typing, and interoperability with a deep canon of existing libraries and algorithms, inclusive of ontology/taxonomy/semantic frameworks, as well as documentation and visualization tools.

A variety of back-end database stores can be used behind that interface. In my view, a graph database w/ the neo4j-graphql javascript library is compelling as it removes the need for writing Resolvers, handling sorting/filtering/pagination, and authoring Mutations. Being able to drop into Cypher via @ directives is a nice escape hatch as well as a way to expose Cypher query capabilities and results as GraphQL.