Open jeoffreyfischer opened 6 days ago
All of the new pages that were listed as Alt page with proper canonical tag were browse pages.
Essentially these browse pages list the pages at the base of each route. They don't have any unique content and they'll be removed when the VM gets switched off. I'm not sure why they're being de-indexed but they'll be removed soon anyway so it's not worth trying to re-index them.
I didn't find any other pages in the Alt page with proper canonical tag category that shouldn't be there.
Other pages in this category on our site are:
https://www.ssw.com.au/rules/add-context-reasoning-to-emails (π user canonical already points to this route, requested re-indexing) https://www.ssw.com.au/ssw/MenuMap.aspx (πgoogle selected correct canonical URL: https://www.ssw.com.au/ssw/menumap.aspx) https://www.ssw.com.au/SSW/Database/LinksSoftwareUpdates.aspx (πuser declared canonical points to http://www.ssw.com.au/ssw/Database/LinksSoftwareUpdates.aspx, same link but with missing s)
https://www.ssw.com.au/people/zach-keeping/ (π google selected canonical URL https://ssw.com.au/people/zach-keeping/ ) https://www.ssw.com.au/people/adam-cogan/ (π user canonical points to https://ssw.com.au/people/adam-cogan/ Which is bad ) ) https://ssw.com.au/people/anastasia-cogan/ (π user canonical points to https://www.ssw.com.au/people/anastasia-cogan/) https://www.ssw.com.au/people/bob-northwind/ (π user canonical points to https://ssw.com.au/people/bob-northwind/) https://www.ssw.com.au/people/chris-schultz/ (π user canonical points to https://ssw.com.au/people/chris-schultz/) https://www.ssw.com.au/people/manu-gulati/ (π user canonical points to https://ssw.com.au/people/manu-gulati/) https://www.ssw.com.au/people/sam-wagner/ (π user canonical points to https://ssw.com.au/people/sam-wagner/gulati/) https://www.ssw.com.au/people/tino-liu/ (π user canonical points to https://ssw.com.au/people/tino-liu/) https://www.ssw.com.au/people/zach-keeping/ (π user canonical points tohttps://ssw.com.au/people/zach-keeping/)
https://www.ssw.com.au/people/ (π user canonical points to https://ssw.com.au/people/)
Alternate page with proper canonical tag - 26,094 pages
See comment above
Excluded by βnoindexβ tag - 1,666 pages
Most of these are on the v1 site, however I've created a PR for stripping the no index tags on the archived pages that had them.
Blocked by robots.txt - 257 pages
Most of these pages are on the v1 site which is being switched off anyway, Only 4 pages on the live site appear here, and they're all 500/404 pages so need to discuss with @wicksipedia whether they should be indexed. I'm assuming they shouldn't be indexed.
List of pages relevant pages excluded include:
Server error (5xx) - 89 pages
Again, this was mostly noise coming from the v1 site. There's a few links to tina template pages in the static content some of the pages on the site being referenced. When Google tries to crawl these links it, of course, returns a 500. Will need to investigate this on Monday.
Blocked due to other 4xx issue - 1 page
It was trying to index a broken link within a rule. I've fixed the link here: https://github.com/SSWConsulting/SSW.Rules.Content/pull/8813
Blocked due to access forbidden (403) - 1 page
The broken link points to http://ssw.com.au/ssw/TeamCalendar/Installation/, however there's no default page at that route. Google attempted to index the page because other pages on the site point to that URL.
This page was also in the 403 list: https://www.ssw.com.au/ssw/Redirect/Access/AccessTrial.htm. However it's a redirect which will be gone when v1 server goes offline. Adam already agreed to scrap the /Redirect route.
Based on the email chain:
From: @wicksipedia Sent: Monday, June 24, 2024 11:20 AM To: @andrewwaltosssw Cc: @sethdaily ; SSW Website v3 SSWWebsiteV3@ssw.com.au; @camillars; Jeoffrey Fischer [SSW] JeoffreyFischer@ssw.com.au Subject: Re: New reasons prevent pages in a sitemap from being indexed on site https://www.ssw.com.au/
Description Search Console has identified that some pages on our website are not being indexed.
URL https://search.google.com/u/1/search-console/index?resource_id=https://www.ssw.com.au/&utm_source=wnc_20237597&utm_medium=gamma&utm_campaign=wnc_20237597&utm_content=msg_110624660&hl=en
Solution Fix the following issues.
Source: Website
Source: Google systems
Acceptance Criteria
Screenshots
Figure: 76k pages are not indexed