GraphQL and variable schemas

hluz commented 8 years ago

Maybe I'm missing the obvious, but how will we be able to retrieve / mutate data with variable schema? Consider the following scenario:

Mongo collection where each document contains a the schema definition for a sub-document. This schema is not fixed but varies from document to document. How can we retrieve such documents when we don't know in advance its schema?

Example collection:

[
  {
    "_id": "1",
    "name": "A",
    "fldDef": [
      {
        "name": "f1",
        "type": "str",
        "opt": false
      },
      {
        "name": "f2",
        "type": "bool"
      }
    ],
    "flds": {
      "f1": "value1",
      "f2": true
    }
  },
  {
    "_id": "2",
    "name": "B",
    "fldDef": [
      {
        "name": "f3",
        "type": "str",
        "opt": true
      },
      {
        "name": "f4",
        "type": "str",
        "opt": true
      },
      {
        "name": "f5",
        "type": "num"
      }
    ],
    "flds": {
      "f3": "value3",
      "f5": 25
    }
  }
]

dmitry commented 8 years ago

we don't know in advance its schema

Do you mean the structure is not strictly structured?

ghost commented 8 years ago

I think he means documents with different fields in the same collection.

I imagine it can be hard to query for data, when you don't know what to query for explicitly. Maybe there's some kind of wildcard in graphQL that'll allow for implicit field queries?

hluz commented 8 years ago

I mean, in schema-less dbs, like Mongo, you don't have to define a schema that constrains the documents structure for that collection. So, you can have two documents in the same collection with different structures. Since the documents are stored in JSON like format, they are self-described, since each document contains both the name and the value of each field or subdocument. You will know the structure when you retrieve the document. but GraphQL requires the structure to be part of the query. So, how to use GraphQl with those dbs?

dmitry commented 8 years ago

Check out this one: https://github.com/facebook/graphql/issues/101

hluz commented 8 years ago

Gee... That is frightening... Such a lengthy discussion and surfaced complexity to solving stuff as basic as retrieving all values of a field array? :-(

justinsb commented 8 years ago

GraphQL encourages use of a well-defined schema. There are definitely benefits to doing so (and trade-offs).

In your example, you would probably currently just tweak the "variable" bit of your schema to be

"flds": [
      { "name": "f1", "value": "value1" },
      { "name": "f2", "value": "true" }
    ]

ianserlin commented 8 years ago

Yes, it is sorta like mustache's (precursor to handlebars) dogmatic refusal to provide any conditional logic within a ui template. Perfection of one theory can (and often is) at the expense of the development experience.

hluz commented 8 years ago

GraphQL encourages use of a well-defined schema. There are definitely benefits to doing so (and trade-offs).

Encouraging is fine. Requiring it not so much.

In your example, you would probably currently just tweak the "variable" bit of your schema to be
"flds": [
      { "name": "f1", "value": "value1" },
      { "name": "f2", "value": "true" }
    ]
Sure. It the end, anything can be represented by a key-value-par. Although that does not make it very functional to query and manipulate... (btw, how are the array search capabilities of GraphQL?)

But the main issue is that people can't afford to restructure the database just because a supposedly db-agnostic query/manipulation syntax cannot handle current db structures. Can you imagine the rippling impact to other applications that use that data?

In my opinion, attempting to be 100% compatible with Facebook's specs is fine. We should support the spec, but that should not stop us having extensions to the spec to cover shortfalls like this. We need a way to specify the equivalent of mongo projections without having to use a schema. If only for mongo. And to be able to use each criteria for fields not defined in the schema.

justinsb commented 8 years ago

@hluz the GraphQL structure does not need to reflect the database structure.

hluz commented 8 years ago

I understand that, but will be increased complexity having to remap that on the server side for no benefit.

hluz commented 8 years ago

justinsb closed this an hour ago

@justinsb, does the closing of the issue mean that there will be no support to retrieve a mongo document (in its JSON format) via GraphQL when the schema is unknown?

ianserlin commented 8 years ago

@justinsb Your last comment seems to be demonstrating a lack of concern and completeness for resolving legitimate questions.

The question was: Can I use GraphQL to request all child data (properties) for a parent field without specifying the name of each child property explicitly.

Your first answer was: You can change how the child data is stored so that you do know the name of each property and can request it explicitly.

The question then became: Can I use GraphQL to request all child data (properties) for a parent field without specifying the name of each child property explicitly. (yes, the same question)

Your next comment

@hluz the GraphQL structure does not need to reflect the database structure.

is hard to accept, because we all know that already (that's one of the points of GraphQL), but still had this question to ask. It's still not an explicit answer, just like "encourages the use of" isn't either. It's a statement of what you think GraphQL is about.

An explicit answer would be: No, you cannot.

There is a whole universe of more useful answers:

No, not in the core library for this repo, and/but:

the core library will support extensions
you can use field parameters
you can write your own query resolver (to pull data)
our query standard resolver that supports mongo will be extensible
this issue belongs in the meteor forum
this issue belongs on a different repo
see this other issue in this other repo
we'll think about it and get back to you
...

Maybe none of those answers are possible, and we don't all have a lot of time to spend writing tutorials in github issue comments AND do our jobs, but...

It is your (and MDG's) job. Especially when creating a project like this.

It's not your job to implement everything people ask for of course, but remember who you are building any of this for.

Simply closing this issue in this manner gives the impression that you don't care and leaves the community to go elsewhere to try and figure out on their own and guess at what you might or might not do/think/say in this regard. That's really not useful at all.

hluz commented 8 years ago

Oh, btw: the suggested remapping to return a made up schema definition supporting

"flds": [
      { "name": "f1", "value": "value1" },
      { "name": "f2", "value": "true" }
    ]

and so that

the GraphQL structure does not need to reflect the database structure

is not at all equivalent to having the original fields returned. Note the we had to coerce non string formats to string (because the client does not know the schema to specify it). So, false becomes "false", date becomes some string representation of a date that the client will not know, etc.

Yeah, of course the server could also pass the introspected original format and mapped format so that the client could re-cast it back, in which case we could have something like:

"flds": [
      { "name": "f1", "value": "value1", "type":"str", "fmt":null },
      { "name": "f2", "value": "true", "type":"bool", "fmt":"true?false"},
      { "name": "f3", "value": "12/01/2016", "type":"date", "fmt":"dd/mm/yyyy" }
    ]

So is that the recommended approach? Seriously?

hluz commented 8 years ago

That's really not useful at all.

... specially when this repo was created to promote discussion of concerns / implementation options. At least that's how it was announced.

Never mind...

Koleok commented 8 years ago

@ianserlin @hluz I don't think the tone of this discussion is a very productive. The original question is a legitimate concern, but its one that some independent investigation into, and experimentation with, the graphql spec will resolve for you pretty quickly.

You can easily do whatever transformations that are necessary in the graphql reslover to shape the data the way you need it, be that building in memory relations, dumping un-accounted for fields into a catch-all property, or literally any other thing you can come up with. The point is that graphql is not tying you down or stealing any existing freedoms afforded by the pub/sub model we have become comfortable with.

@justinsb probably closed this issue, because it is not an issue with the viability of the conceptual model that has been proposed. I don't mean to be indelicate, but rather than making it mdg's job to scramble to solve the problem you are seeing with this use case, take a step back and come up with a solution. I can promise you that you don't have to change your persisted data (note: if you do end up doing this, it is most likely due to the recognition of a design flaw that already existed), and that you will not be sacrificing performance compared to existing pub/sub.

Keep in mind the insightful distinctions between simple and easy outlined in this awesome Rich Hickey talk we are all in this together man.

ianserlin commented 8 years ago

@Koleok I attempted to skip character-assassination and not claim to actually know anything about @justinsb by just talking about perception of a particular action. Words on paper...

Anyway, thank you for this:

You can easily do whatever transformations that are necessary in the graphql reslover to shape the data the way you need it, be that building in memory relations, dumping un-accounted for fields into a catch-all property, or literally any other thing you can come up with.

That provides the context needed to continue independent investigations and/or continue discussing possible patterns to accommodate this use case (including simply using "traditional" pub/sub instead of GraphQL for them). For me, that is actually enough for now until I get to play with the actual interface.

Regarding:

I don't mean to be indelicate, but rather than making it mdg's job to scramble to solve the problem you are seeing with this use case, take a step back and come up with a solution.

Ok, so, we all solve problems every single day. In fact, that's what we're attempting to do here.

It is MDG's job to develop Meteor core and it seems like this repo will be a part of that.

But, I'm not trying to coerce MDG into creating a solution for this use case right now, before the code is written, or even within any specific time period... simply to continue discussing it. I'm not just practicing writing words here.

If you're saying this is not the right place to have this discussion, that's cool, so be it, it can move to the forums or wherever else easily.

Koleok commented 8 years ago

If you're saying this is not the right place to have this discussion, that's cool, so be it, it can move to the forums or wherever else easily.

@ianserlin I think this a valid place to pose the question, it just seemed like things were snowballing in the direction of implying mdg was somehow sweeping this under the rug or that it was a fundamental hole in this project. I don't think that is the case, and didn't want others viewing this discussion to come away with that impression.

Ok, so, we all solve problems every single day. In fact, that's what we're attempting to do here.

This is a good point, I got a little riled up there but ya we are all just trying to figure this thing out. I do think that this scenario would be worthy of a gist / blog post / forum thread somewhere that could go into detail since many folks will encounter it.

It may even be worth establishing a re-usable strategy by publishing a helper module on npm like graphql-schemaless-query or something, just some extensions in the vein of graphql-sequelize

stubailo commented 8 years ago

Reopening because I think there is still a valuable discussion to be had here. I'm currently on vacation but will come back to it next week.

AdamBrodzinski commented 8 years ago

@hluz

How can we retrieve such documents when we don't know in advance its schema?

First, I think it's not a great signal to be throwing data in all willy nilly, and a bit of planning can go a long way. Just because the database doesn't enforce a schema doesn't mean an application layer shouldn't.

In development i'll typically add a field to GraphQL and use it, perhaps I change it and then I reset the DB and change the GraphQL schema. Once it's live the field would be deprecated and a new field is added (for example pid was deprecated and renamed to postId so legacy apps wouldn't break and new ones would use the new field).

If one document field returns a string and another returns a number there is no way to predict what will happen in an app and generally makes for extremely brittle software. Things in an app will change of course and adding a new field and deprecating another goes a long way.

The type system is a feature of GraphQL and shouldn't be viewed as a hassle... it adds integrity and reduces bugs.

GraphQL is very flexible and will not crash if a field is missing from an object for example, it will just return null in the document that has it missing, for example given this data in the db:

"fldDef": [
      {
        "name": "f1",
        "type": "str",
        "opt": false
      },
      {
        "name": "f2",
        "type": "bool"
      }
    ],

a GraphQL response would return:

"fldDef": [
      {
        "name": "f1",
        "type": "str",
        "opt": false
      },
      {
        "name": "f2",
        "type": "bool",
        "opt": null
      }
    ],

This means you'll need to have your UI check for null conditions if this is the case (you can also require all fields when inserting to prevent this if it is truly required). This output is much more predictable now.

Having GraphQL for Mongo/Rethink/etc.. is actually quite nice because you can now easily see what should go into and come out of the database. Going back into code 1 year ago it's impossible to tell what a Meteor method would return without doing a lot of digging.

If the app truly doesn't need a schema for throw away prototyping purposes then GraphQL might not be the best solution.

ghost commented 8 years ago

I think grahpql, given its schema approach to retrieving data, should feature the flexibility of schemaless data utilising wildcards... I dunno tho, what would/should that kind of API look like?

hluz commented 8 years ago

@AdamBrodzinski

First, I think it's not a great signal to be throwing data in all willy nilly, and a bit of planning can go a long way. Just because the database doesn't enforce a schema doesn't mean an application layer shouldn't.

Having a long background of data modelling on large corporations, some with literally thousands of different applications and diverse data stores, I think I understand the implications (+ or -) of schema design on maintainability (and other ilitis) quite well, but thanks for the education attempt ;-)

However, the lack of a client's knowledge of a schema before hand does not mean that the schema is undefined or non-existent, but just that it may not be known at the time. If you read properly my example (or maybe I did not explain it clearly enough), you will notice that the data store contains the schema of the variable part of each document (a sub-document) in the document itself as another sub-document (not the place to discuss the reasonings or merits for doing so, but it does have reasons for being this way. Happy to explain it to you out-of-band if you are interested). Of course, if the schema sub-document could be retrieved first, and then used to retrieve the variable sub-document, we would not be having this conversation. But needing to do so from the client (although possible as a solution) incurs on two serialised network interaction with the server to retrieve each document (note that even if the schema sub-document was keep separate from the document, a client would still need to retrieve the schema part first in order to include the schema in the GraphQL query).

Explaining the reason for a specific scenario is not directly relevant to this question (relating to being able to do something that is one of the advantages of having self-described documents), which JSON based databases enable, is. And looks like GraphQL, by enforcing the need to have a pre-defined schema (rather than enforcing the schema only where one is present), has diminished usefulness as it stands.

My point is that it would be advantageous to Meteor to have some way (maybe via an extension to the GraphQL spec) to cater for these scenarios (which by the way, being easy to implement in Mongo, may be common in many existing Meteor apps) while getting all the other benefits from using GraphQL. And I do believe that this would not be hard to implement (getting the GraphQL specification custodians to accept it or even consider it important enough to spend any time looking at it, may be the hardest part).

Having GraphQL for Mongo/Rethink/etc.. is actually quite nice because you can now easily see what should go into and come out of the database. Going back into code 1 year ago it's impossible to tell what a Meteor method would return without doing a lot of digging.

So you think that "enforcing" the use of a predefined schema on a storage engine that caters for self-described documents is ok, just based on 'documentation benefits' only? Please note that I did not say it is useless to have the capability to enforce schemas where wanted. I do agree that having the capability to use GraphQL with predefined schemas with Mongo/Counch/Rethink/etc is quite useful, apart from any argument on where any enforcement should occur. Being able to "easily see what should go into and come out of the database" as you put it is not even the most useful benefit here... please note that this only constrains the client, not the server or the database - false sense of security et all.

BTW, document validation enforced by the db layer is now available in Mongo 3.2 as a preview.

ianserlin commented 8 years ago

Ok, ok, there is a legitimate use case for loading data whose property names you don't know/care about before hand.

Here is mine:

I am building an API which allows clients to only manipulate properties of documents in a mongo collection under each document's .metadata object. e.g.

{
    _id: '1235',
    propertyOne: 'not changeable by the client',
    propertyTwo: 'not changeable by the client',
    metadata: {
         // as the client, you can put whatever data you want in here
    }
}

Within that object they can store whatever fields they wish that makes sense for their application. Because they know what fields they are storing in metadata, it is not their app which runs into this limiting factor of GraphQL.

The admin/management application that I build/run which allows inspection of the .metadata properties in an "admin" ui does not, traditionally in Meteor et. al., need to know beforehand what fields exist in the .metadata object of a document in order to display them (by iterating over the object keys) or do something useful with them, like modify the values, or add/remove properties.

So far, there are three possible designs highlighted in this discussion if I'd like to primarily use GraphQL to specify the data contract between the client and server in the admin application:

side-load the data with unknown structure only if/when needed using a meteor method or equivalent (my first choice so far)
restructure the .metadata object to be an array of key-value pairs like { name: 'propertyName', value: 'someValue' } (not bad, but the change in storage structure, while making the object easier to deal with via GraphQL, makes it harder to deal with almost everywhere else, as I think @hluz is pointing out)
as properties are added/removed to the .metadata field, keep a list of those property names somewhere (could be something like: .metadataFields: [ "userId", "color", "sku" ] and use those to create GraphQL queries when needed (least favorite)

It's definitely possible that this is not a strong enough use-case to extend the standard GraphQL spec.

Regardless, I'm guessing there are/will be more options (including perhaps re-architecting my app in a better way?), but that somewhat depends on how the API for Meteor's version of a GraphQL resolver evolves.

That's why I'm interested in this topic, to learn more without attempting to dictate what is ultimately implemented. Whatever that ends up being, of course we can always work around it if/when needed.

@Koleok

It may even be worth establishing a re-usable strategy by publishing a helper module on npm like graphql-schemaless-query or something, just some extensions in the vein of graphql-sequelize

Yeah! Sweet link, something like how they are extending their resolver to support the special order and limit args to meet their specific use case is another option.

hluz commented 8 years ago

@ianserlin, interesting... there are some similarities with the use case that prompted this thread. As a comparison with your app, here a birds-eye description of my case - the approach taken was like your 3rd bullet post (metadataFields: [{...},{...}]):

Users can create documents of predefined types, and at creation time the equivalent of your metadataFields (in fact a schema-like list of objects defining allowed fields, including name, label, type, validation rules, etc -) is added to the document from pre-defined settings for the selected document type. The document then contains something like:

{
    _id: '123',
    docType: 'a',
    ...,
    metadataFields: [
      fieldDefObj,
      fieldDefObj
    ],
    metadata: {
         // fields as defined by fieldDefnObj above as attributes
    }
}

When then rendering one of these user documents, the metadataFields on that document (in fact a static schema for the variable part of the document) is used to generate forms to enter/modify fields as defined in the metadataFields. The variable fields are stores as properties of the equivalent of your metadata sub-document. And the now static schema ruling the content of the metadata field of each document ensures that there is data integrity even on the (from a db point of view) schema-less part of the document.

This scenario is enormously facilitated by Mongo's schema-less capability: Adding a new type of document (by defining the list of field names, type, rules, etc), can be done via an Admin function without the need to change the client code, server code or database structures, while at the same time you can still use the regular database search syntax, including secondary indexes in the metadata sub-document.

Implementing this type of scenario with databases that require pre-defined schemas, although possible, is a lot more complex at all levels (storage, readability, storage, querying, exporting data, etc.)

Our mission, should we chose to accept it, is to somehow (ideas abound if the will does) have this supported by GraphQL in an efficient and simple way - and no, I don't mean easy :-)

sebakerckhof commented 8 years ago

@ianserlin I'm not arguing against the fact that schemaless can have advantages for certain use cases, but I don't see the problem with:

restructure the .metadata object to be an array of key-value pairs like { name: 'propertyName', value: 'someValue' } (not bad, but the change in storage structure, while making the object easier to deal with via GraphQL, makes it harder to deal with almost everywhere else, as I think @hluz is pointing out)

Why would it be harder to deal with everywhere else? You can basically only do 2 things: 1) Iterate over the object or array, which is easier and faster if it's an array. 2) Display one specific property (object.specificProperty). But since you don't know what's in the metadata object, you can't do this anyway?

ianserlin commented 8 years ago

@sebakerckhof It's a good question. Specifically, I'm calling this "harder": editing or removing a particular existing "field" of an array-acting-as-an-object... than manipulating an object directly.

At a fundamental level though, consider any case where it would be legitimately better to use a hashmap instead of an array. Arrays are really most useful when you care about the ordering of the elements within it, that's what they were designed for. In this case (and many) I don't care about order.

I think the acceptance of the doing the object-as-array-of-object method is the argument that: well using an array isn't that much worse. I would agree that's true.

Before I go do that all over the place, I'm wondering if we are expecting that to be the best way we can address this use-case.

1) Iterate over the object or array, which is easier and faster if it's an array.

Faster sure, but the difficulty of writing code to iterate over either is the same.

2) Display one specific property (object.specificProperty). But since you don't know what's in the metadata object, you can't do this anyway?

Mmm... If I have an array of key/value pairs and I'm looking for a specific property to output, I need to search the array for it instead of doing metadata["specificProperty"].

And for all unknown properties, typically, we select/display specific properties by iterating over the properties of an object, like:

var metadata = {
    userId: 5,
    penguins: "exciting",
    isItSaturday: true,
    .... other properties no one knows exists
};
_.each(metadata, function(value, key){ console.log(key, value); });

Will print out all the key/value pairs of the metadata object and it's essentially the same to output that to html instead of the console.

hluz commented 8 years ago

@sebakerckhof, on the encode as key value pairs array alternative, let me add a bit of actual complexity into the scenario to help you grasp the potential implications...

If all fields of the metadata subdocument were scalar, it would be easy... but this is just a simplified case only. Each one of these fields may in fact be composite fields. Then the representation of such structures using key value pairs becomes unwieldy.

A small illustration of what one of those documents could have:

{
    _id: '123',
    docType: 'a',
    ...,
    metadataFields: [
      fieldDefObj, 
      ...,
      fieldDefObj
    ],
    metadata: {
      foo: 25,
      bar: [
        "a",
        "b", 
        "c"
      ],
      baz: {
        bazz: false,
        bazzz: true
      },
      qux: [
        { quux: "val3", quuux: false },
        { quux: "val4", quuux: true }
      ]
   }
}

Now, map that to key value pairs and consider the complexity and implications on things like (using mongo syntax for illustration only):

Docs.find({
  type: "a",
  foo: { "$gt": 12 },
  "qux.quux": "val3"
})

Gives you an idea? In practice it may become a lot more complex, specially when you start to add structures representing tables of fields and/or types of fields not easily cast to and from strings, like for example dates.

BTW, you may think that the illustration is unrealistic, but it is not. That is the type of filtering that is required to be performed again list of documents cached in minimongo from a subscription, or be required from a subscription criteria. Enterprise LOB apps can be a bitch, no? Just a tiny bit more complex than a list of my friend's comments...

As an aside, why is it that instead of discussing the request to introduce capabilities (i.e., allow for easy retrieval of data whose schema is not known or variable), we end up discussing why can't we implement instead using workaround x or y? Our brains appear to go into "solution mode" attempting to fit the square peg in the round hole, instead of looking at it as a feature?

ramsaylanier commented 8 years ago

It should be pretty easy to extend the GraphQLTypes to build a JSON Scalar type.

In fact... https://github.com/wistityhq/waterline-graphql/pull/8

Abdul-Hameed001 commented 7 years ago

I'm new to graphQL and i started searching how to pass an array of data into the mongodb using graphQL and mongoose. The problem is how to query the mutation with multiple datas like mutation{ add:[{_id:7247,sal:9,time:"0.09",name:"iuyrf"},{_id:4962,sal:10,time:"0.09",name:"iuyrf"}]{ _id sal time name } } my question is how to write this query in a structure and please help in designing schema too

stubailo commented 7 years ago

I'm going to close this issue since it's quite old. @Abdul-Hameed001 I think it's best to ask on Stack Overflow.

pabx06 commented 7 years ago

stopped reading after 15m i decided to not read any more. this is too overkill

stubailo commented 7 years ago

GraphQL is inherently optimized for situations when you have a static schema. So if you don't, then it's definitely possible that it's not the right choice! However in my experience most apps have a schema that changes very rarely unless it's something like a CMS.

mcblum commented 6 years ago

@stubailo we're actually researching this exact issue because we're building a CMS. This discussion is a bit old, but is the common consensus that Graph isn't a good choice for this application?

priedthirdeye commented 6 years ago

"GraphQL is inherently optimized for situations when you have a static schema."

"is the common consensus that Graph isn't a good choice for [a CMS]?"

I'm interested in the answer to this as well.

sjatkins commented 5 years ago

GraphQL encourages use of a well-defined schema. There are definitely benefits to doing so (and trade-offs).

In your example, you would probably currently just tweak the "variable" bit of your schema to be
"flds": [
      { "name": "f1", "value": "value1" },
      { "name": "f2", "value": "true" }
    ]

People moved to schema less for really good reasons. In many an application I have a data display at frontend of ll object data as at least an optional rollover popup. Anything that says I have to know all fields and their types of that class of objects is a non-starter. It pushes me back to having everything but some core set of common fields be mashed into a json stringified blob. Not unworkable but ugly.

And what of schema migration? How painful is that going to be?

ancms2600 commented 5 years ago

Sixty iterations off the central finite curve, there's a Rick who works more with JSON than GraphQL.

I eventually went with a greatly simplified from-scratch implementation. Sharing in the hope that others may take gratuitous inspiration:

https://gist.github.com/ancmikesmullin/526b64b262561fb4ca1be824c2faec7f#file-test-sgql-alpha2-js-L63-L67

apollographql / apollo

GraphQL and variable schemas #5