Open MTheProgrammer opened 8 months ago
This solution makes sense to me. Is there any specific reason to index this html, I don't think so. Do you have the possibility to change your html file and verify if it works correctly?
I've forked the repo and updated the code that generates the html: https://github.com/BuilderIO/partytown/commit/506199790630557bfab403399ebd3f258ab641e5
In Gatsby, these files are copied from the partytown to the static
directory:
function setupPartytown() {
const path = require("path");
const { copyLibFiles } = require("@builder.io/partytown/utils");
exports.onPreBuild = async () => {
await copyLibFiles(path.join(__dirname, "static", "~partytown"));
};
}
I'll check GSC after few days to verify whether this page is still being indexed.
It doesn't seem to help:
@MTheProgrammer I see 🤔 maybe is the ~partytown
folder. Can you try to remove the ~ from the folder name pls?
That's the official documentation with ~patytown
directory: https://partytown.builder.io/gatsby#copy-library-files
You mean to change the folder name in all places where it is used?
The page ~partytown/partytown-sandbox-sw.html
is dynamic, the static folder contains only .js files:
My guess is that Google robot crawls the page without cache and without cache it returns 404 - because Partytown worker has not yet been installed.
Every page includes iframe with the link to the Partytown. However, attribute rel="nofollow"
is not valid as in the anchor tag <a href=www.example.com rel="nofollow">
EDIT: I'm testing a hack with empty physical ~/partytown/partytown-sandbox-sw.html
file containing noindex,nofollow directive. When worker is ready it returns the correct dynamic page.
I see, great research. So I'm wondering how serve a different/valid html for the crawler but preserve the Partytown code in the html 🤔
did noindex, nofollow the script folder help? Facing the same issue in GSC @MTheProgrammer
Describe the bug
Hello, other people mentioned this problem, but they couldn't reproduce the bug:
In Google Search Console every crawl introduces new 404 pages for
/~partytown/partytown-sandbox-sw.html?XXX
url:Reproduction
https://lucidmodules.com/~partytown/partytown-sandbox-sw.html?1706003192708
Steps to reproduce
This might vary depending on whether you've already been on this page and web worker has been installed. However, when you clean the browser cache it should be as follows.
404 page on hard reload/first time download:
The correct page returned after worker has been installed:
Browser Info
Chrome
Additional Information
Maybe adding
<meta name=”robots” content=”noindex,nofollow“>
to the head would solve the issue with google bot trying to index thepartytown-sandbox-sw.html
page.