FabricMC / fabricmc.net

The source code and content for https://fabricmc.net/
MIT License
51 stars 28 forks source link

[Suggestion] Docs not indexed by search engines #40

Closed Starmania closed 10 months ago

Starmania commented 1 year ago

Hey !

Sometimes, I want to search from my browser a mapping of yarn for a specific version of minecraft. The perfect query for that is site:https://maven.fabricmc.net/docs/yarn-1.19.4+build.2/ class_1234 but there is one condition. In this case, Google, need to have indexed the page http://maven.fabricmc.net/docs/yarn-1.19.4+build.2/index.html but because nothing is refering to this page, Google (or other search engines) doesn't know that this page exist...

This could be really easily implemented with sitemaps.

Sitemaps contain the list of all url of page with important content in it. And good news, in the sitemap, you only need to provide the maven.fabricmc.net/docs/<project>/index.html because the search engine will scrap the rest of the page ! Also, sitemaps doesn't need to be at the root of the domain ! It could be for example be at maven.fabricmc.net/docs/sitemap.xml and be ref on a robots.txt (that need to be at root... but can be hided like the docs/ dir)

So if you don't have the time to open your best IDE and only have your phone but no tools other that a web browser to check quickly, you could do it !

PS: Sorry if it's not the right place to make a issue for maven.fabricmc.net, but I didn't found a better choice

modmuss50 commented 1 year ago

Ill try and take a look, im not too familar with how a sitemap/robots.txt works, so ill have to do a bit of research into this. This should be doable though 👍

Starmania commented 1 year ago

If help wanted, I'm here, I only need to know how /docs/ are maintained.

modmuss50 commented 1 year ago

It seems to be fairly strightforward, we already have a list of all the paths here: https://maven.fabricmc.net/jdlist.txt Just need to write a quick bash script or something to covert this to a sitemap.

Im just reading googles docs and it seems to support a text based format: https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap#text it should be trivial to also output this from our existing bash script and then add a robots.txt to point to this.

My only slight worry is distrupting the SEO of the main website (https://fabricmc.net/ vs https://maven.fabricmc.net/), as far as I can tell this shouldnt be an issue.

modmuss50 commented 1 year ago

Was nice and easy!

https://maven.fabricmc.net/robots.txt and https://maven.fabricmc.net/sitemap.txt

Hopefully that helps, many thanks for the suggestion 👍

modmuss50 commented 1 year ago

After a bit more thought, we think it might be best to only index the latest version, I am somewhat worried about it trying to index thousands of large javadoc pages.

Starmania commented 1 year ago

Don't be worried for that, so let the full list. Why? :

Starmania commented 1 year ago

Also, let search engines do their jobs, they need time (only 1~3 days) for a new index, they will sort the page by using the trafic they generate.

Starmania commented 1 year ago

Hello, have you added a sitemap ?

modmuss50 commented 10 months ago

This was resolved, the sitemap and SEO should be better now.