Closed tunetheweb closed 4 years ago
How about adding a pre-commit hook that updates the last_updated in the markdown files? Another option (perhaps simpler) is to use git itself as a part of npm run generate
, something like this:
>> git log -1 --pretty="format:%ci" content/en/2019/performance.md | cat
2019-11-05 22:33:38 -0800%
We can do something dodgy, like call out to the git cli from node?
I'm happy either way.
I'm not that familiar with git, pre-commit hooks, the cli...etc. so think it would be good if you could take this on @mikegeyser ?
Good thing is that as part of https://github.com/HTTPArchive/almanac.httparchive.org/pull/345 we will no longer just dump this date into the files as is, and we use JavaScript to read and format the date, so that gives us more options of converting the date to the format we need from that format that git spits out.
Is that date specific to the repo? Or will the value (and format) change depending on the local timezone of whoever runs npm run generate
? Hopefully it's repo-specific so it'll be the same for everyone.
The git commit has timezone information as a part of it.
Are we ok with the standard toISOString()
timestamps? So 2019-11-04T18:46:53.000Z
instead of 2019-11-04T18:46:53+00:00:00
.
Are we ok with the standard
toISOString()
timestamps? So2019-11-04T18:46:53.000Z
instead of2019-11-04T18:46:53+00:00:00
.
I think so. https://developers.google.com/search/docs/data-types/article#non-amp-sd has this to say:
The date and time the article was first published, in ISO 8601 format.
And the wikipedia link says both are fine.
For the ultimate test, I also tested it in the Structured Data Testing Tool (you can load a URL like my http2 chapter and then edit it) and it accepted both formats!
Reopening to add one feature request. I just ran npm run generate
for https://github.com/HTTPArchive/almanac.httparchive.org/pull/364/files and was surprised to see unrelated chapters getting their last modified
updated. Would it be possible to only update the timestamp on chapters that have been changed? A few reasons:
Was just looking at this. Suspect it's in part from other branches and other forks. As we update the .md files with each generation, they become part of the commit. Need a better way of doing this.
I think we should revert this commit for now.
Or at the very least comment out this line for now: https://github.com/HTTPArchive/almanac.httparchive.org/blob/a2f7576456e381045f336085286f61d89c234324/src/tools/generate/index.js#L1
I'm so sorry, I could have sworn it was working correctly. :(
Oh, I see where the flawed logic is now. :( Updating the file, even just to update the timestamp, will mean that becomes the latest commit - updating the timestamp. This approach could never work if we change source files, but could possibly work if we only use it to generate the timestamps in the html.
Again, I'm really sorry for the mistake.
I really think the way to do this is with a Git Action (that I'm not familiar with).
On merge to master:
The CI/CD dream perfect for automating future edits/corrections.
But not one for launch.
BTW if you change this line:
From this:
const command = `git log -1 --date=iso-strict-local ${path} | cat`;
To this:
const command = `git log -1 --date=iso-strict-local master ${path} | cat`;
Then at least it gets the master commit time rather than local branch commit time.
Three issues with this:
npm run generate
from this repo.npm run generate
after all commits.So not a great solution either but thought I'd throw it out there in case want to do this short term in next week or two.
@ethomson as you've been helpful on that other issue, have you any thoughts on how best to tackle this one (it was mentioned in that other issue you responded on, but it has much bigger scope that this little one).
To summarise above, we basically want to automatically update a timestamp in a file when a pull request is accepted for that file. The timestamp should be set to the date that the pull request was merged (or perhaps when submitted if that's easier) - but it should only be changed on files changed in that PR, and btw not all files have that timestamp meta data. Let me know if that doesn't make sense and can provide more details with examples.
Ignore this @ethomson. Thanks for you help on the other thread! will have a play.
As part of https://github.com/HTTPArchive/almanac.httparchive.org/pull/302 and https://github.com/HTTPArchive/almanac.httparchive.org/pull/308 we added Article Structured Data which depends on having a Published Date (required or it fails the Google Structured Data Testing Tool) and a Last Updated Date (recommended by the Google Structured Data Testing Tool). These have been implemented in the chapter markdown files as meta data. For example:
This allows for different timestamps for different publication dates and also for different timestamps for the translations which might be published at different dates - hence why I put it in the markdown.
At present these dates must be manually maintained which @rviscomi and I have been discussing.
I think that makes sense for the Published Date as this should be set once and forgotten about, but for the last_updated this is more error prone and people may forget to update them when changing the contents.
It should not be generated as part
npm run generate
as the contents my not have changed for every chapter (or even any chapter!) when it is run, and also generic formatting changes, which are not specific to this chapter (e.g. when base_chapter.html or chapter.html is updated) probably don't count as a "real edit".Requirements:
last_updated
meta data in chapter md files when a merge is accepted into that chapter’s markdown. Is it possible to do this as part of a GitHub action?The format currently has to be as per above example, however this will be converted as part of
npm run generate
as per https://github.com/HTTPArchive/almanac.httparchive.org/pull/345 so less of a concern (and we should use an easier format for authors!).Bonus points 1 - have a global "earliest publish date" per year and auto flag any date (This has been implemented as part of https://github.com/HTTPArchive/almanac.httparchive.org/pull/345published
orlast_updated
) before then (or when a date is not present?) to prevent errors creeping in and prevent the merge. Set this global constant to 2019-11-04T12:00:00+00:00:00 for now and then we can update it later once we have an actual publish date.@mikegeyser any thoughts on this?