crate / crate-clients-tools

Clients, tools, and integrations for CrateDB.
https://crate.io/docs/clients/
Apache License 2.0
2 stars 1 forks source link

Documentation: Sitemap publishing/referencing stopped, or never worked #59

Closed amotl closed 9 months ago

amotl commented 9 months ago

Problem

Through GH-58, we discovered that the sitemap.xml files of the documentation artefacts may no longer be referenced and respected.

Aha! It looks like https://cratedb.com/robots.txt and https://cratedb.com/sitemap.xml do not include any references to the resources at https://cratedb.com/docs/ any longer? Do you know what may have caused this change?

/cc @msbt

amotl commented 9 months ago

This item is missing when compared to an earlier point in time.

<url>
<loc>https://crate.io/docs</loc>
<lastmod>2022-04-19</lastmod>
</url>

-- https://web.archive.org/web/20220627082026/https://crate.io/sitemap.xml

amotl commented 9 months ago

Also, I am pretty sure we had a master site.xml file somewhere, maybe controlled / augmented by Wordpress, which indexed all the second-level sitemap files of the documentation artefacts. For example:

Now, once more, no one needs to wonder why the documentation discoverability is so bad, specifically in update situations, as @hlcianfagna recently reported:

Hi, it seems this page may not be getting indexed, could you take a look? https://cratedb.com/docs/crate/clients-tools/en/latest/integrate/etl.html (try searching for singer or meltano or jinja)

We need to fix this. 💥

msbt commented 9 months ago

@amotl very true, we used to have a meta-sitemap in WordPress which included all WP pages and all docs sitemaps. We're submitting the docs sitemaps to GSC/Bing individually at the moment, because with the HubSpot internal one we can add pages only manually. However, what we could do is create a new sitemap index [1] including the docs and the regular sitemap which we then link in the robots.txt:

image

[1] https://www.sitemaps.org/protocol.html#index

msbt commented 9 months ago

This item is missing when compared to an earlier point in time.

<url>
<loc>https://crate.io/docs</loc>
<lastmod>2022-04-19</lastmod>
</url>

-- https://web.archive.org/web/20220627082026/https://crate.io/sitemap.xml

This is because the docs page is only a redirect these days

msbt commented 9 months ago

@amotl While on topic: is there a reason why we haven't merged the jdbc, npgsql, pdo, python and dbal repositories into one single project and maintain them as separate ones?

amotl commented 9 months ago

Very true, we used to have a meta-sitemap in WordPress which included all WP pages and all docs sitemaps.

a) Good. Bad that this has been dropped. Let's re-establish the meta index page NOW.

is there a reason why we haven't merged several projects.

b) Yes, because they just are separate projects.

msbt commented 9 months ago

@amotl https://cratedb.com/robots.txt

New index is live: https://cratedb.com/hubfs/sitemap_index.xml

It was created manually, so if a new docs repository shows up or gets removed, this needs to be done manually as well.

amotl commented 9 months ago

Thank you very much for bringing this back so quickly! 💯