Closed johno closed 3 years ago
Oh interesting! Yeah, not setting a parent in onCreateNode when creating a node is a definite no no. I'm struggling to think of when that'd ever actually make sense... if we can't think of any, we should just hard error there (or maybe warn initially to not break people's sites).
Another thing we could check for is if all (or most, say 80%+) or a plugin's nodes are deleted startup due to being stale. That'd be easy to calculate and again should be extremely uncommon so we could issue a strong warning that the plugin probably has a bug.
Yeah, not setting a parent in onCreateNode when creating a node is a definite no no.
Screenshot transformer does that - in very similar way as linked plugin (because both use createRemoteFileNode
that doesn't set parent). The difference is screenshot plugin also touches those file nodes so they don't get garbage collected. We definitely need to add ability to set parent to createRemoteFileNode
helper.
Ah yeah, ok so it's a weakness in the helper. Oh and we also shouldn't then delete nodes that are linked to from other nodes 🤔
That's the real problem. File nodes don't need a parent. They just need not deleted. If I'm remembering the stale node algorithm correctly, we only check for parents.
This is a pretty major bug. The stale collection stuff is our GC step but since we're not collecting all references, we're deleting too much.
I'm not sure though how to cheaply find ___NODE references since recursing through every node would be very expensive.
I'm not sure though how to cheaply find ___NODE references since recursing through every node would be very expensive.
Not ideal, but we do recurse through each node when we produce example value for schema inference. But using that has potential to easily introduce regressions when changing seemingly unrelated code.
I honestly am not sure we should do that (shield ___NODE linked nodes from being garbage collected) - especially with onCreateNode
callback that is really uniquely fitting for transformer plugins. The file node that is created and linked there is really derived from "parent" node so it makes sense to enforce that this File node is child.
I guess maybe instead of changing createRemoteFileNode
to add support for setting parent, we could adjust createParentChildLink
action to set both children on parent, and parent on child?
https://github.com/gatsbyjs/gatsby/blob/e651e9c1a4aef644801305fdc7ff4843c18d8267/packages/gatsby/src/redux/actions.js#L760-L773
That + warning/error containing instructions to use that action seems like it would solve this without adding overhead of recursing nodes to find ___NODE fields
Hmm thought of a cheap way to do the fix. Just track which nodes are created in onCreateNode from which node. This would be an implicit parent/child relationship even if the data wasn't modeled that way. We could wrap the createNode action to save this. Then in the GC step, we could do a deep search on the remaining parent nodes for links.
So I just tried to reproduce this with gatsby-source-instagram-all
and can't seem to be able to. It is consistently working for me.
redux-state.json
is written in .cache
correctly and onCreateNode
is being called everytime the server is started.
Any ideas?
Turns out the Instagram API returns tags for an image in random order every time. So content digest is different every time! That's why they were getting marked as dirty and couldn't reproduce.
Hmm thought of a cheap way to do the fix. Just track which nodes are created in onCreateNode from which node. This would be an implicit parent/child relationship even if the data wasn't modeled that way. We could wrap the createNode action to save this. Then in the GC step, we could do a deep search on the remaining parent nodes for links.
This was published in
- gatsby-remark-graphviz@1.0.7
- gatsby-source-drupal@3.0.24
- gatsby-source-filesystem@2.0.21
- gatsby-source-shopify@2.0.15
- gatsby-source-wordpress@3.0.37
- gatsby-transformer-screenshot@2.0.12
- gatsby@2.1.11
Going to close this since rather than guiding we've been able to fix it with code itself. Thanks @KyleAMathews, @pieh and @sidharthachatterjee!
Reopening this, because this was fixed in single instance (createRemoteFileNode
) and not generally, so users can still hit this - as demonstrated in #12981
Hiya!
This issue has gone quiet. Spooky quiet. 👻
We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
Thanks for being a part of the Gatsby community! 💪💜
Hi there,
I have an issue and I think it might fit here. I wrote a plugin called gatsby-source-gh-readme, which you can find on NPM gatsby-source-gh-readme and the repo is on github here gatsby-source-gh-readme.
The plugin makes a graphql call to the github graphql api and should pull in the file contents of the README.md
file at the root of each project in a github user's account and then create a node (of mediaType: "text/markdown"
) for each in Gatsby.
The plugin works, for each README.md file Gatsby creates a node of type markdown and then those node are processed by the remark plugins just like the blog post markdown files...
However, frequently (almost every time,) I need to clear the cache before running gatsby develop
or the build will fail. If I clear the cache prior to running gatsby develop
the build succeeds.
I prepare the data for createNode()
with this function:
// Helper function that processes a repository node to match Gatsby's node structure
const processRepo = repo => {
const nodeId = createNodeId(`repo-readme-${repo.id}`);
const readme = repo.readme;
const nodeData = Object.assign({}, repo, {
id: `repo-readme-${nodeId}`,
parent: null,
children: [],
internal: {
mediaType: "text/markdown",
type: `GithubReadme`,
content: readme,
contentDigest: createContentDigest(repo)
}
});
return nodeData;
};
you can see that parent
is null and children
is an empty array. I don't understand what parent should be, or indeed what children should be…
Apart from the fact that there is probably a better way to write this plugin, it does seem to be example of what your talking about in this issue.
Apart from flagging this as a potential example of the problem discussed in this issue. I have some general questions about this plugin that are not related:
when writing a plugin how do I decide what parent, or child should be?
how could I write this to make use of graphql from gatsby itself, it seems silly to use another library to make the call to the github api, but I couldn't find a way to use gatsby's graphql from inside the plugin.
how can I get feedback on how I wrote this plugin? I am sure there is a better way to write this...
Here is an example of a theme that creates tag nodes that disappear from the cache for no obvious reason: https://github.com/reflexjs/reflex/issues/39.
This is related to #11747 as an isolated, reproducible case of caching issues. cc/ @pieh @DSchau
Description
Numerous plugins in the ecosystem aren't interlinking a node and the parent on
onCreateNode
which causes nodes in the cache to suddenly go missing. I stumbled across this issue working on a proof of concept withgatsby-source-instagram-all
but I'm sure this affects plenty of other plugins.It's not currently intuitive how to properly set up data sourcing in ways that interact well with the cache.
These issues are unfortunate because they create distrust in the cache (rightfully so). Logging warnings and helping to guide plugin authors towards the right path will help alleviate caching woes.
Steps to reproduce
Use a plugin like the reproduction and stop, then start, the server.
Expected result
Restarting the server and running the build should work.
Actual result
Restarting the server and running the build require clearing the cache.
Steps to fix
Currently, the steps to fix involves
rm -rf .cache
🙀.To solve the fundamental issue here will require two steps (I think)
• We should issue a warning in
onCreateNode
if there's no node passed as a parent and link to a doc that elaborates on the issue • We should have a doc for plugin authors that describe the Gatsby caching system and how nodes need to be interlinked to be properly cached and avoid errorsI'm anticipating that the doc should be in the style of a guide for plugin authors that might be sourcing nodes since the existing docs (that I can find) are more low-level but I'll defer to @marcysutton here.
Related