Closed klin2020 closed 1 month ago
:mag: Preview in Federalist
@klin2020 Thanks for the detailed explanation. Some questions:
Which meta fields will search.gov use when we re-index after we remove the
lastmod
field?We don't use a
lastmod
field in the markdown to set for each page, is this something we should consider adding in the future if we want to improve our sitemap?Could we use
.Params.date
field to as the next best option for setting thelastmod
field?<lastmod>{{ safeHTML ( .Params.date "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
I was wondering this as well. I was expecting the last modified date to default to the date published.
Hi @nick-mon1 @RileySeaburg
@klin2020
To be clear, I'm not sure we need to remove <lastmod>
to have the site reindexed.
We prefer documents that are fresh. Anything published or updated in the past 30 days is considered fresh. After that, we use a Gaussian decay function to demote documents, so that the older a document is, the more it is demoted. When documents are 5 years old or older, we consider them to be equally old and do not demote further. We use either the article:modified_time on an individual page, or that page’s
<lastmod>
date from the sitemap, whichever is more recent. If there is only an article:published_time for a given page, we use that date
Unless I'm misunderstanding something, updating the <lastmod>
tags to reflect the content publish date, and then requesting a reindex should fix this issue.
Please explain the proper procedure if I am incorrect.
If I am not, please update the <lastmod>
tag.
@nick-mon1 @RileySeaburg Re-introduced lastmod tag with page date.
Removed lastmod tag for review again @nick-mon1 @RileySeaburg
I'm going to merge this so we can test the re-index today.
@mejiaj there will be another PR where the
Summary
Current search on DG results in outdated articles that bury more recent articles.
Search.gov search results are based on a ranking algorithm that looks at the tag in a website's sitemap or a page's date, whichever is most recent. Our sitemap currently updates the tag to be the current date, leading to the ranking algorithm to weigh every page on DG equally, rather than by its proper publish date
Solution
Remove the tag in the DG sitemap build, so when we re-index DG, the re-index will use the page metadata for its proper date.
Once re-index occurs, we can edit the tag to reflect the page's publish date, rather than the current date.
Screenshots
Current sitemap (including). Every date reflects the same date, causing issues with the ranking algorithm
Proposed change to sitemap (temporarily remove for Search.gov re-indexing)