git / git-scm.com

The git-scm.com website. Note that this repository is only for the website; issues with git itself should go to https://git-scm.com/community.
https://git-scm.com/
MIT License
2.16k stars 1.22k forks source link

Find a way to prevent broken book images #1391

Open pedrorijo91 opened 4 years ago

pedrorijo91 commented 4 years ago

As reported on #1390 , book images get broken from time to time.

I see 2 possible solutions:

smithmf commented 4 years ago

Took a stab at this one in pull request #1392. A simple rake task that will parse each img element from _ext_books.erb and if the URL cannot be retrieved, a warning is printed to the console. A cron job could be setup to execute this rake task

peff commented 4 years ago

One problem with checking via rake task is that these are all run via Heroku's scheduler, which doesn't provide any mechanism for feedback (i.e., to tell us that some links are broken). After searching a bit, the usual proposed solution seems to be using the exception-notifier gem to send an email. But that implies configuring the site to send email, which is notoriously annoying.

I wonder if we could instead have it open an issue in this repository. That would require setting up a bot GitHub account, but once done, it would be pretty reliable.

Just downloading the images into the repo does side-step all of this, but it might be nice to get to the root of the problem. In particular:

smithmf commented 4 years ago

I was addressing the case of a successful request with a non-image body and ran into some URLs returning a 301. We could follow the eventual location from the response header, but then there is the potential risk of infinite redirects. Adding all that logic seems a bit overkill for this solution, seeing as like @peff mentioned, this isn't a likely long term solution.

It might make more sense here to punt and take the low-tech solution of downloading the images to the repo and updating the src attributes. A more automated is definitely preferable, but takes more design as @peff suggests.

peff commented 4 years ago

I think any kind of link-checking will have to handle redirects at some point. It would be OK to just put a hard limit on looping (say, 10). I do wonder if there's a higher-level gem than net/http we could use that would do this sort of thing for us (likewise the https handling, which I notice you had to do manually).

JeremiahKamama commented 4 years ago

I would like to tackle this as my "good first issue"

jnareb commented 3 years ago

All cover images for Packt books are broken, as they lead to the images in (long expired) cache.

For example, the cover image for "Mastering Git" book is now

https://www.packtpub.com/media/catalog/product/cache/e4d64343b1bc593f1c5348fe05efa4a6/b/0/b03537_cover.png

"Mastering Git" cover

instead of

https://static.packt-cdn.com/products/9781783553754/cover/smaller

"Mastering Git" cover

pedrorijo91 commented 3 years ago

added https://github.com/git/git-scm.com/pull/1566 as a quick fix. but maybe we should just keep the images in the repo as we have with a few others (https://github.com/git/git-scm.com/tree/master/public/images/books)

peff commented 3 years ago

Thanks for jumping on this, @pedrorijo91. I'm not opposed to importing book images. They might get out of date with the source images, but if the publishers/authors care, they can open a PR. I'd rather have slightly old images than broken ones.

But I'm also OK to punt on that to see if this actually happens again. It looks like our packt links were probably not the right ones in the first place here.

js1darren commented 1 year ago

As reported on #1390 , book images get broken from time to time.

I see 2 possible solutions:

  • download book images to this repo (we may need to update them from time to time?)
  • actively check for broken images (cron rake task to check links are not broken?)
dscho commented 1 week ago

This could be done using Lychee, which is already integrated into the CI builds of https://github.com/git/git-scm.com/pull/1804. Therefore, once we accept this PR, at least the image links to git-scm.com will be all verified.