civichackingagency / scangov

Government digital experience monitor
https://scangov.org
8 stars 2 forks source link

Extend Sitemap.xml definitions #95

Open mgifford opened 3 months ago

mgifford commented 3 months ago

Is your feature request related to a problem? Please describe.

Just looking at: https://gov-metadata.civichackingagency.org/docs/sitemap

The sitemaps need to be valid xml, but they should also be links that matter.

Some agencies might simply point to the press release section? Others may be every page of the site. Sometimes there are multiple sitemaps for different purposes. Sometimes these are described in the robots.txt file, but generally they aren't well defined.

I don't know how you tell good vs bad sitemap.xml files. Possibly correlating the age of the files. If the files haven't been updated in more than a year, likely it's not a very useful list of URLs.

Describe the solution you'd like I'd like to see a count for the number of URLs.

https://www.cms.gov/sitemap.xml has 55 pages of URLs. That's a lot.

The whitehouse has them broken down into a number of categories: https://www.whitehouse.gov/sitemap_index.xml

Only 2 are highlighted here, and they are not documented: https://www.whitehouse.gov/robots.txt

Here's another example where they have the content broken up: https://www.loc.gov/sitemap.xml

Which one would you choose to start with?

Describe alternatives you've considered

I think there's space to look at date, or format. Is this a custom built list of URLs or is it something that is obviously just spit out from the Drupal Sitemap.xml module?

Additional context

On a different note. Rather than bold might be better to make these all headings:

Heading 2

Is just easier to navigate than a bold.