Open mlissner opened 2 months ago
We have the following sitemaps:
Sitemap | description | changefreq | lastmod | priority | limit |
---|---|---|---|---|---|
/sitemap-oa.xml |
The oral argument sitemap | monthly | obj.date_modified 1 | 0.4 | 50,000 |
/sitemap-blocked-audio.xml |
Contains oral argument audio files that had the noindex set on them in the last 30 days. This exists to encourage Google to crawl these items (so Google can stop showing them). | daily | obj.date_modified | 0.6 | 50,000 |
/sitemap-o.xml |
The opinions sitemap | yearly | obj.date_modified | 0.5 | 50,000 |
/sitemap-blocked-opinions.xml |
Blocked opinions, like for OA | daily | obj.date_modified | 0.6 | 50,000 |
/sitemap-r.xml |
Federal dockets. Limited to items filed in last 30 days or with views greater than 10 2 | weekly | obj.date_modified | scaled based on view count from 0.3 for unviewed to 0.65 for > 1,000 views | 50,000 |
/sitemap-blocked-dockets.xml |
Dockets that had noindex set on them in the last 30 days | daily | obj.date_modified | 0.6 | 50,000 |
/sitemap-p.xml |
For judges (aka People) | monthly | obj.date_modified | 0.5 | 50,000 |
/sitemap-disclosures.xml |
For judicial financial disclosures | yearly | obj.date_modified | 0.5 | none, apparently 3 |
/sitemap-visualizations.xml |
For visualizations. This really doesn't matter. | yearly | obj.date_modified | 0.4 | none! |
/sitemap-simple.xml |
For simple flat pages, like help pages | varies based on page, but mostly set to "yearly" | not set | varies from 0.1 to 0.7 | n/a, only a couple dozen pages |
1 This is the last time it was updated in our DB, but it doesn't necessarily represent the last relevant update time. This value is often updated when something silly happens to an item, like it's view count is incremented or its title was tweaked, say.
2 The idea here is that if something is new it should show up. If it has more than ten views, it should show up. So this is a list of items that are new (and haven't gotten views yet) or things that have gotten at least ten views within 30 days.
3 This is surprising! I don't know what this would do, but it does seem to be paginated if you go to page=2, or whatever.
Finally, /sitemap.xml
is our sitemap index. In theory, it just links to all the others, using pagination where needed (e.g., /sitemap-disclosures.xml, /sitemap-disclosures.xml?page=2, etc).
Generating this page has two challenges:
There's an open issue from May of last year about our oral argument sitemap timing out: https://github.com/freelawproject/courtlistener/issues/2752. Seems to still be an issue, surprisingly.
I checked a few others:
We've engaged a company to help us rank better in generic searches, and one of their early findings is that our sitemaps need work.
A few suggestions they've made are:
They're not great at understanding our sitemaps, and frankly that's fair, since the sitemaps aren't loading, so I'm going to make some notes here about how they work (and don't).