gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.19k stars 10.33k forks source link

[Discussion] Feature: GraphQL schema snapshots for all data sources to solve undefined/empty data issues #3344

Closed jsanchez034 closed 5 years ago

jsanchez034 commented 6 years ago

Description

Currently there are issues with GraphQL schemas produced from data sources where at the moment gatsby develop or gatsby build is executed the data shape is incomplete, parts are in an empty state or in a different type than they would be if the source data was filled out. Below are a few example issues..

It would be great if as the data shape evolves on the data source side, you could create test pages or pieces of content that are fully filled out, meaning no empty fields. Then on the Gatsby side you could run a cli command called something like gatsby snapshot-schemas which would fetch the current data sources, run the source data through there regular plugin data normalization paths, run the data through the existing infer GraphQL schema code and then finally at the end take the schemas generated and save them off to a folder in /src called schemas.

On subsequent builds Gatsby could skip over inferring of the GraphQL schemas when it sees schemas defined in /src/schemas. These schema snapshots could then committed into a sites repo and allow for data shape changes to require new snapshots instead of just data source changes. These schema snapshots open up many possibilities such as the validation of incoming data shape changes. Schema snapshot diffs could be shown in Gatsby CLI as well when gatsby snapshot-schemas is ran again once initial schemas have been saved.

I would love some feedback on the idea itself from the contributors of the various source plugins. If the idea passes the smell test, I would like some feedback on what format the snapshots should be saved to. Maybe the GraphQL schema language using something like gestalt-graphql would be nice.

m4rrc0 commented 6 years ago

It seems like an interesting idea since it should work regardless of the source. It would definitely work for my usecase. While I am pretty confident about simple fields, I am wondering how it would work for assets though.

jsanchez034 commented 6 years ago

Good question about assets. I think for assets most data sources represent them as a string with a URL or path to the asset. The schema snapshot would just represent the type as a string with the URL/path or an asset object which has a string property with the URL/path.

jquense commented 6 years ago

Seems like an interesting idea, though why not skip the extra indirection of a snapshot and and define the schema fully? Like if we're going go a head and make fully complete data example, it seems like not much more work to directly specify the gql schema while also being clearer?

KyleAMathews commented 6 years ago

@jquense that's what I'm thinking. You could either run a command to "snapshot" the dynamically generated schema which would write out a file with the schema in normal graphql form or you could directly write the same form yourself.

jsanchez034 commented 6 years ago

Yeah I definitely like the idea of keeping it open to write your schema from scratch along side the "snapshot" feature. The benefits I see from the "snapshot" feature is the ability to ease yourself into learning the GraphQL schema syntax and reducing the amount of boilerplate code needed to get up and running with Gatsby and any source plugin.

I am currently looking into the internals of Gatsby's schema inferring logic in .. https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/schema And getting familiar with how GraphQL schemas are defined. I will try to create a proof on concept PR soon.

jquense commented 6 years ago

I totally understand easing users into a new thing, however I think plugins are a bit different. Authors are generally expected to understand gql already and users shouldn't generally have to mess with anything other than querying when consuming a source plugin.

That said tho the schema language for defining said schema is probably less complicated than querying, in terms of getting up to speed. I think it'd be time better spent allowing plugin authors to provide a .graphql file than working on snapshot ting a comprehensive data object, which may not even be enough to properly define the schema wholly.

I would like to mention to that manually defining the schema is already possible and supported via the gatsby api, albeit with graphql-js not the terser schema language.

jsanchez034 commented 6 years ago

I understand that plugin authors should know gql already and users shouldn't need to mess with schemas. But the issue is that the schema produced from inferring from the source data is many times incomplete based on the current state of the source data (see the issues I mentioned in the opening of this issue). So the way I see it is in order for users not to really mess around with having to write there schema's from scratch, they could set up the source data to complete, take a snapshot of the schema and then have Gatsby use that snapshot going forward. This way the user avoids having to write schemas from scratch in order to solve empty/undefined data causing the graphql queries to fail.

Also could you elaborate a little more on how " I think it'd be time better spent allowing plugin authors to provide a .graphql ..." would work?

On Thu, Dec 28, 2017 at 8:27 AM Jason Quense notifications@github.com wrote:

I totally understand easing users into a new thing, however I think plugins are a bit different. Authors are generally expected to understand gql already and users shouldn't generally have to mess with anything other than querying when consuming a source plugin.

That said tho the schema language for defining said schema is probably less complicated than querying, in terms of getting up to speed. I think it'd be time better spent allowing plugin authors to provide a .graphql file than working on snapshot ting a comprehensive data object, which may not even be enough to properly define the schema wholly.

I would like to mention to that manually defining the schema is already possible and supported via the gatsby api, albeit with graphql-js not the terser schema language.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gatsbyjs/gatsby/issues/3344#issuecomment-354288169, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuc_6X_eEWEf281KUspXxB1NuPZPeC4ks5tE5clgaJpZM4RNLJn .

jquense commented 6 years ago

So I think we are missing each other on where this should be solved :) in the WordPress example, I think rather than the plugin completely relying schema inference it should fully specify fields that may exist, e.g. all the nullable fields. Then a user won't run into the problem at all, because gatsby will have a complete schema for the datasource.

In cases where the plugin author can't possibly know what all the fields are I still think that, while more of a learning curve, it'll end up being more accurate and less error prone for the user to provide the rough schema vs fake data to be inferred as a schema. Yes you need to learn more up front (and we should provide good documentation for it) but you avoid users needing to understand the internal workings of how gatsby infers schemas. For example, if a user wants to specify a date field, it's (I think) simpler to learn to directly specify a date vs know to provide an iso compatible string that will be inferred as a date. Plus a lot of other features like connecting two data nodes, or specifying nullablility, is only really doable with a schema definition vs a data object

m4rrc0 commented 6 years ago

What about 2 different plugins? One that allows us to easily specify the schema in a file and another one to generate that file from a specific source. So the source might be different... For example a contentful space where every field is complete might be the source to generate the schema. Then the schema can be used on any contentful space that has the same architecture. And of course we might still tweak the schema manually if needed. In any case it seems like plugin no1 is the first step. We might still decide when it is done what to do next. @jquense is there somewhere an example of a manually defined schema? I didn't know it was possible.

jsanchez034 commented 6 years ago

I like the idea of two plugins. I think the API @jquense is talking about is https://www.gatsbyjs.org/docs/node-apis/#setFieldsOnGraphQLNodeType. But I think it only allows to add fields to a schema right? And not overwrite what was inferred.

On Thu, Dec 28, 2017 at 10:06 AM M4rrc0 notifications@github.com wrote:

What about 2 different plugins? One that allows us to easily specify the schema in a file and another one to generate that file from a specific source. So the source might be different... For example a contentful space where every field is complete might be the source to generate the schema. Then the schema can be used on any contentful space that has the same architecture. And of course we might still tweak the schema manually if needed. In any case it seems like plugin no1 is the first step. We might still decide when it is done what to do next. @jquense https://github.com/jquense is there somewhere an example of a manually defined schema? I didn't know it was possible.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gatsbyjs/gatsby/issues/3344#issuecomment-354302287, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuc_4fqrocUXxBPkFsK2tFn2VVDduv3ks5tE66AgaJpZM4RNLJn .

jquense commented 6 years ago

Yeah I think you are right it doesn't all overwriting :/

I'm generally in favor of plugins to solve issues but I think this sort of thing may need to handled in the core api to strike the right usability notes (I could totally be wrong tho). In my head the ideal api would be just to point gatsby at a .graphql file and it does the rest, but that may not as nice as I imagine. I think it's all probably worth exploring in plugins tho if possible before possibly moving into core... I of course defer to what Kyle thinks since he's the boss :)

jsanchez034 commented 6 years ago

Ok cool I will do some exploring around the first plugin for defining schemas as .graphql files. I'm interested to hear Kyle's input too :) Thanks for the great discussion everyone!

On Thu, Dec 28, 2017 at 10:31 AM Jason Quense notifications@github.com wrote:

Yeah I think you are right it doesn't all overwriting :/

I'm generally in favor of plugins to solve issues but I think this sort of thing may need to handled in the core api to strike the right usability notes (I could totally be wrong tho). In my head the ideal api would be just to point gatsby at a .graphql file and it does the rest, but that may not as nice as I imagine. I think it's all probably worth exploring in plugins tho if possible before possibly moving into core... I of course defer to what Kyle thinks since he's the boss :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gatsbyjs/gatsby/issues/3344#issuecomment-354306088, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuc_6wNLHfte9zCZ2OqgirIMRadX609ks5tE7RTgaJpZM4RNLJn .

pieh commented 6 years ago

@jsanchez034 did you look more into it? I will propably try to work on this next week and I'm interested if you did some research and can share your findings

jsanchez034 commented 6 years ago

Hey Michal, I did some research and have some next steps in mind. I'll post my findings when I get back on my machine in a few hours.

On Fri, Jan 5, 2018 at 7:16 PM Michal Piechowiak notifications@github.com wrote:

@jsanchez034 https://github.com/jsanchez034 did you look more into it? I will propably try to work on this next week and I'm interested if you did some research and can share your findings

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/gatsbyjs/gatsby/issues/3344#issuecomment-355699899, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuc_3w7px_NJjL2EKl8G5TP1GJgQFVWks5tHrtNgaJpZM4RNLJn .

jsanchez034 commented 6 years ago

Hey @pieh, I started looking into how we can inject whole new schemas into the schema that Gatsby infers from the source data. So I figured a plugin could be created to add schemas to Gatsby. Below are my notes for this plugin...

Schema writer plugin

What should the plugin do?

Those are my rough notes on a schema writer plugin. The other plugin that @MarcCoet mentioned would be one where you could point the plugin to endpoints on a domain where data is set up to be in a complete state then schemas would be generated and saved off to src/schema.

Note: Schemas would be defined using GraphQL type language

pieh commented 6 years ago

Unfortunately I don't think graphql-tools will work here as it was meant to merge full schemas (that is both definitions and data/resolvers) and adding links between those schemas.

From documentation of mergeSchemas function (https://www.apollographql.com/docs/graphql-tools/schema-stitching.html#mergeSchemas) about passed schemas to merge:

schemas is an array of either GraphQLSchema objects or strings. For strings, only extend type declarations will be used. Passing strings is useful to add fields to existing types to link schemas together, as described in the example above.

We would rather want to use string option, as GraphQLSchema require to define at least 1 query (or it throw an error), but extending type doesn't allow to overwrite field type. Error is thrown when trying to overwrite existing field (conflict resolution function isn't called so nothing we can do about it unfortunately):

Field "frontmatter.title" already exists in the schema. It cannot also be defined in this type extension

As for second option (using GraphQLSchema) I "hacked" it to work by adding dummy query, but initial problem is I can't use types from "main" schema - we can't simply do this:

type frontmatter {
  featuredImage: File
  title: String
}

because File isn't defined in this additional schema. We could probably append builtin types so it would work but I feel like trying to "fool" this tool to do stuff it's not designed to do (at least at this point in time) is not the way to go.

I'll will try to evaluate 2 other approaches:

pieh commented 6 years ago

I've made some progress. Decided to go with gatsby specific approach for now (adding support for "forced types" in https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/schema ) as I found it easier to work with than playing with graphql schema objects. I'll try to clean up my code a bit so it would be more readable and create pull request this week to discuss it further there.

Just little sneak peak of results (imo quite promising): I've used https://github.com/gatsbyjs/gatsby-starter-blog and used frontmatter in posts to emulate some problems that can cause build errors caused by queries that don't match schema:

// post #1
---
title: Hello World
date: "2015-05-01T22:12:03.284Z"
featuredImage: "sasalty_egg.jpg" // badly formatted value that wouldn't allow to make this File type 
test: "some string" // incompatible value type with other posts (in other posts test field is object) - it would be skipped in schema
---

// post #2
---
title: My Second Post!
date: "2015-05-06T23:46:37.121Z"
test:
  floatField: 5.5
  intField: 99
  arrayOfStringsField:
    - "it"
    - "works"
---

// post #3
---
title: New Beginnings
date: "2015-05-28T22:40:32.169Z"
test:
  stringField: "string"
  boolField: true
---

Used type definitions:

type MarkdownRemark {
  frontmatter: frontmatter_forced_type
}

type frontmatter_forced_type {
  featuredImage: File
  title: ImageSharp
  test: TotallyCustomType
}

type TotallyCustomType {
  stringField: String
  floatField: Float
  intField: Int
  boolField: Boolean
  arrayOfStringsField: [String]
}

Note: those don't have to be full type definitions (f.e. in MarkdownRemark definition I just force type of frontmatter field to not rely on automatic type naming)

Here's how frontmatter looks before using my changes (no test field, "bad" type of featuredImage field): before

And after: after

jsanchez034 commented 6 years ago

Hi @pieh! Thanks for taking this on, your solution looks very promising! Sorry I couldn't contribute more to this issue in the past couple of weeks, I haven't had the time to dive into this. Let me know if there is anything I can help with.

pieh commented 6 years ago

If anyone is interested you can view my code changes https://github.com/gatsbyjs/gatsby/compare/master...pieh:schema_wip (after deleting some dead code and refactoring some stuff I had yesterday it's much less code than I thought :) ).

Don't have time to put PR for comments with good description and website repository used for testing, so figured I would just share my code for now. If anyone want to play with it - place gatsby-schema.gql file with schema definition in root directory of your project (along gatsby-node.js, etc.).

m4rrc0 commented 6 years ago

Any chance this could make its way into a PR? A lot of people would be glad to have this I think.

pieh commented 6 years ago

@MarcCoet more I dived into this code more I realized that my approach in that branch can be used only as proof of concept. I plan on working on this to have it done proper way but this a lot bigger task than I originally anticipated (as I was learning more about how currently schema creation is done and what features it offers I didn't earlier know about as I was coding stuff)

So currently schema creation looks like this: current

Implementation is doubled for each source of types. My proof of concept (branch I linked) adds just another source and now we have it tripled: poc

It would be unmaintainable to just add this in current form.

What I want to do is so this looks something like that: proposed But this require some major refactoring/rework as current schema creation is designed with inferring types from data as only source of data. After that, adding another source would be trivial with knowledge I gained during development of my proof of concept.

I want to create RFC issue for this (I created flowcharts earlier this week :) ). Just need to get some spare time to actually commit to discussion this would start.

HZSamir commented 6 years ago

@MarcCoet I personally have moved all my projects temporarily to react-static, as this feature is essentially a deal-breaker for me. But I plan to transition back to Gatsby the second this is fixed.

pieh commented 6 years ago

I created RFC issue about refactoring ( https://github.com/gatsbyjs/gatsby/issues/4261 ), if anyone is interested to join discussion, please do so.

m4rrc0 commented 6 years ago

Thanks for the update and the work @pieh . I'll keep an eye on the RFC issue.

i8ramin commented 6 years ago

Hey guys. Any updates on this? Currently a blocker/show-stopper for us to use Gatsby + Contentful and just wanted to know if there was any active development or if we should consider other options. Thank you.

pieh commented 6 years ago

@i8ramin There isn't active development yet, I plan to start it this week.

nicolaslair commented 6 years ago

Hi guys, just checking if someone made any progress with this issue - we just discovered it wehen dealing with optional / empty fields in Contentful > Gatsby - so it would be nice to have a solution - obviously :) Thanks in advance!

nthndnn commented 6 years ago

Hey @pieh, just wondered if anything has happened with the development? Would love to use Gatsby with Contentful for a project, however, this is a little bit of a blocker. cheers in advance!

pieh commented 6 years ago

Hey @nicolaslair and @nathanjdunn ! Sorry for lack of updates on this. I had to put this project on hold (...again) and focus my time on gatsby v2 for time being. I wish I had better news :(

nthndnn commented 6 years ago

Hey @pieh, will this issue be fixed in Gatsby v2?

pieh commented 6 years ago

This is not scheduled for v2 - this feature won't be breaking change so plan is to push v2 as fast as possible and then add this feature in one of v2 minor realeases

dominicfallows commented 6 years ago

@pieh and anyone else, I'm doing a messy noramlizer to make sure that there is always the same data model for GraphQL regardless of optional/null fields etc. Has anyone come up with a better workaround ahead of something more core being developed?

pieh commented 6 years ago

Pattern I've used was to create "dummy" entries (in CMS / markdown) that would have all optional fields filled and later when building gatsby site I would have filter in queries, so dummy nodes not show up in results.

dominicfallows commented 6 years ago

@pieh that is a good idea, thanks I think I'll give that a go

dominicfallows commented 6 years ago

For reference, I'm actually now doing this dummy entry workaround in my source plugin, which keeps the CMS team happy, and means the normalizing and exclusion of dummy entry is handled within the Gatsby code, rather than in the CMS or UI React components.

HZSamir commented 6 years ago

@dominicchapman Care to create a demo repo for this? or a Medium post? or anything at this point

m4rrc0 commented 6 years ago

Any news?

m4rrc0 commented 6 years ago

Just seen @Undistraction's plugin https://github.com/Undistraction/gatsby-plugin-node-fields .

a simple, consistent way to manage the creation of fields on your nodes, with support for default values, transformations and validation of values

I am thinking this should be part of core... I'll definitely give it a try.

drobinhood commented 6 years ago

Any updates on this?

pieh commented 6 years ago

This will be resumed after v2 is released. I don't have bandwidth to handle v2 and this at the same

supermoos commented 6 years ago

Is there a suggested workaround until sometime after V2 when this would be worked on again? Sounds weird if so many people using GatsbyJS has never had a need for optional fields. I'm using it with Contentful btw.

westmark commented 6 years ago

We're also using it with Contentful and this behaviour is driving us nuts. As we see it the Contentful plugin is mostly at fault here for (supposedly) constructing the schema from data received rather from the actual content type schema received over the API.

pieh commented 6 years ago

@westmark contentful plugin don't have power over this - it's just right now Gatsby doesn't allow plugins to set explicit types and schema is inferred from available data

As for workaround - if possible create dummy entry for each content type that will contain all optional fields and then filter that dummy entry out when you are creating pages or listings of entries. Dummy entry data will ensure full schema is built

supermoos commented 6 years ago

@pieh Thanks for the work-around suggestion for now :-)

westmark commented 6 years ago

@pieh I did not know that - thank you for clarifying. A follow-up question then, wouldn't it be possible for plugin authors to inject dummy values for missing data before passing it on to Gatsby? If Gatsby admits keys with undefined as value into the schema I would regard this as a far better workaround.

ErisDS commented 5 years ago

We're also running into this issue with gatsby-source-ghost. Our data schema has a lot of fields that can be legitimately returned as nulls, and are therefore omitted by Gatsby.

I 100% hear and understand that any work on this is paused until after than land of v2 (which I'm very much looking forward to). I wanted to flag up the Ghost source plugin as being another good example where this occurs & offer to help out if there is anything we can do whenever the time comes to resume this 🙂

stefanoverna commented 5 years ago

Same problem on https://github.com/datocms/gatsby-source-datocms 😄

westmark commented 5 years ago

Question: In V2, could we use this to avoid the "query on missing data"-problem?

https://next.gatsbyjs.org/docs/actions/#addThirdPartySchema

In other words, could the Contentful plugin use this function to inject the schema for a content model that has no data/entries yet?

stoltzrobin commented 5 years ago

@pieh have you had any time to refine this and start any development? 😁

moreguppy commented 5 years ago

Hello everybody, I'm quite new to GatsbyJS but have been using Contentful for a long time, and I just started trying out the framework and messing around with a bit of GraphQL. Ran into this issue quite quickly as it is often useful to have optional fields in Contentful.

However, I discovered if you are pulling down a collection (eg. allContentfulNews etc) and you have at least 1 entry in that collection which has content in all the fields (in my case I have an optional 'Expiry Date'), then it won't break the build and the entries with the Expiry Date empty returns as null. So as a workaround I'm planning on keeping this "template" post in, and hiding it on the frontend.