git / git-scm.com

The git-scm.com website. Note that this repository is only for the website; issues with git itself should go to https://git-scm.com/community.
https://git-scm.com/
MIT License
2.16k stars 1.22k forks source link

WIP: Convert git-scm.com to use GitHub Pages #942

Open spraints opened 7 years ago

spraints commented 7 years ago

Re @peff's recent ML message, I started playing around with converting this site to be a Jekyll site, so it can be hosted on GitHub Pages instead of heroku. https://github.com/spraints/git-scm.com is the new code, and http://pages-test-git-scm.pickardayune.com/ is the rendered site. So far, I've converted the home page and the "about" pages. My goal is for the pages site to be able to handle all of the same URLs that the current site knows about. Also, I'm not a designer, so I'm also shooting to make the site look exactly identical.

The home page was a fairly mechanical conversion. In the rails app, the "about" page is a single page that rewrites its URLs; I split it up so the links are normal links.

I'll continue to poke at it in my spare time. I'd also accept pull requests 😻 to my gh-pages branch. It should be pretty easy to figure out what needs to be done: find a broken link on http://pages-test-git-scm.pickardayune.com/, and copy content from the rails app in https://github.com/git/git-scm.com to the right place in the jekyll app. Rails helpers need to be changed into flat HTML or liquid tags. I hear that there are man-pages in the current site's database, which may take some more effort to convert.

pranitbauva1997 commented 7 years ago

@spraints Cool! I will be glad to help though I don't know about rails but I do know about Jekyll. If you open up the issues on your repo, it would be awesome.

maxlazio commented 7 years ago

@spraints I had a question about your choice of Jekyll, is there any specific reason why you chose it over middleman?

connorshea commented 7 years ago

@maxlazio because Jekyll is better than Middleman :P Faster to compile, develop with, and generally more people know it. Plus it's the main SSG for GitHub Pages.

spraints commented 7 years ago

@maxlazio because of GitHub Pages. It's what I use for static sites, and seemed like a good fit here.

@pranitbauva1997 why does https://github.com/spraints/git-scm.com need issues? Pull requests are available, just open a pull request with spraints:gh-pages as the base branch.

maxlazio commented 7 years ago

@spraints Thanks for your answer, makes sense. I'll send my PR's to the fork and we can coordinate in issues here when necessary.

Sicaine commented 7 years ago

I like this idea very much. I think its a good idea.

But if this becomes a reality, doesn't it make more sense to do a fluent migration? Like having a small PR with adds Jekyll and than i/others can start to migrate it step by step?

spraints commented 7 years ago

But if this becomes a reality, doesn't it make more sense to do a fluent migration? Like having a small PR with adds Jekyll and than i/others can start to migrate it step by step?

@Sicaine I would like to wait until the site is completely migrated before cutting this repo over to jekyll. The migration can go step-by-step in my fork. For now, I've been manually verifying that the migrated pages look the same on https://git-scm.com/ and http://pages-test-git-scm.pickardayune.com/. I have a small linter script, too.

Sicaine commented 7 years ago

@spraints i just think that it is harder to maintain two repos up to date. But i also don't know how long it will take.

maxlazio commented 7 years ago

Had some time so I've sent a couple of PR's.

sxlijin commented 7 years ago

To be honest - and I don't mean to demean the effort - I don't think this migration is worth it. As I suggested in the email chain, I think it makes more sense to simply generate static assets from the Rails codebase and dump them to a separate GH Pages repo for hosting. The main reason for that is simply because there's already a large body of static assets that need to be generated independently (the man pages and Pro Git) which you would somehow need to add to a Jekyll-based solution. On top of that, although the site is mostly static, there are some dynamic back-end components that need to get fleshed out first, the biggest one being the ElasticSearch layer.

peff commented 7 years ago

@sxlijin Yes, I think the shortest way to get to a static site is to crawl the generated Rails site and dump it into a static repo for hosting (and in a sense, that's really what a CDN is doing; it's just a big cache). I'm not sure it's a good idea for long-term maintainability, though.

Going through Rails makes things a lot more complicated, mostly because the contents of this repository only tell half the story. Most of the actual site content is imported into the database, and its freshness is tracked in a totally separate way (and mostly a primitive one; updating the import code requires manually invalidating the database entries to pull in new versions of the manpages or book content).

So if you imagine that the conversion process is to occasionally run:

rails server
wget --mirror http://localhost:3000
git add -A
git commit -m 'regenerated static site'

That is heavily dependent on what happens to be in your database at the time you generate. We can work around that, but I think the end result will be simpler if we can just import directly to the filesystem and skip the round-trip through the database. And then we can also perhaps leverage filesystem-aware tools like make during the build.

In a sense, the bits of conversion that have happened so far aren't really the interesting part. There's some tedious work, but it's mostly just converting the templates from one form to another. The heavy lifting, I think, will be the conversion of the manpage and book importers.

I dunno. Maybe I am underestimating what value the database is providing to the display code.

Sicaine commented 7 years ago

I think the page itself is a great example of a good static website. Independend on how often it is regenerated. A nightly build isn't hard and when done properly the heroku issue is gone.

But independed from any decision: As long as nothing in particular is decided, its hard to tackle any issues or improvements. I don't wannt to start another approach while this one is going

peff commented 7 years ago

The heavy lifting, I think, will be the conversion of the manpage and book importers.

The other hard part, I think, is deciding what to do with search. That's the actual dynamic part of the app. I don't know what kinds of 3rd-party solutions we could use there. Obviously linking to google.com?q=site:git-scm.com&search+terms is one way, but that is less nice than the search that's there (which does type-ahead suggestions). I think something like Google's Site Search is a good match, but I don't know the pricing or complexity there.

I'm pretty unfamiliar with this space in general. That's one of the reasons I was soliciting opinions on the list. :)

pranitbauva1997 commented 7 years ago

It is free and easy to integrate google custom search for a website

connorshea commented 7 years ago

@peff we can dynamically generate the site if we host it with GitLab Pages, which would auomate the currently-manual update process (See this blog post for example). If you don't want to do this that's fine too :P

Also we can use Algolia DocSearch for free search, only problem might be that we'd have an Algolia logo in the search results.

sxlijin commented 7 years ago

To be clear: I'm a fan of Jekyll. I've used it before and I find it, in combination with GH pages, to be absolutely fantastic for its purpose.

Pending a resolution to the search question, though, I would be hesitant to move forward with it. (The man pages and book matter too, but since those are simply static assets, there are definitely ways around that.)

One friend I spoke to suggested using hosted ElasticSearch if we really want to keep it (which would be 40$/mo from a cursory search) but my honest opinion is that ES is absolutely bonkers overkill for searching the site, especially since as it stands, search doesn't even search page contents, just page titles. And to be frank, that seems like something that's very easily migrated to just one big JS file.

jnavila commented 7 years ago

JFTR, I am discussing with another guy who is ready to translate the site to another language. Does migrating to Jekyll would allow providing translated versions of the site?

sxlijin commented 7 years ago

There's no clear obvious solution (I'm having trouble building the json-1.8.3 gem locally for some reason, so I can't check what gems are bundled in github-pages; I did turn this up while googling) that I can find which is compatible with GH pages.

That being said, Jekyll allows you to have data files (stored in repo_root/_data/), the contents of which can be referenced from the page templates themselves. As an example, I manage this website and use one such file to control the links shown in the navbar, so I imagine it's easily possible to roll a custom framework around the _data files to support i18n work.

peff commented 7 years ago

I agree that search is a big wildcard on the static-site transition. Moving the first few individual pages over has been a good validation that the end result looks good (both rendered and in code). But the next step is probably validating those other assumptions (search, imports, etc). I had always assumed we would move to something like Google's site search, but it's possible the "one big JS file" approach might be even simpler.

I hadn't given i18n much thought, as the site is not currently translated. But it does seem like an obvious future direction to go in. I'll ping GH pages folks and see what their thoughts are on the state of the art.

sxlijin commented 7 years ago

Well, Google Site Search definitely isn't an option anymore.

peff commented 7 years ago

I think they're just discontinuing the on-premise Enterprise product. The thing we would use is probably https://cse.google.com/cse/. I hear they are also getting rid of the paid version of that, but I'm not sure we would have used it anyway. That leaves the "ad-supported" search. Which I guess is how normal Google works, but I'm not sure how ugly or intrusive it is on the site.

pedrorijo91 commented 6 years ago

searching using algolia seems to be a solution worth exploring if we ever go into jekyll: https://blog.frankel.ch/search-static-website/

dscho commented 1 year ago

👋

I got interested in this project ever since it was announced that Heroku ended its free tier that we use in https://git-scm.com/. We currently have a sponsor, but it would make most sense to spend that money more wisely and making https://git-scm.com/ a nice show case for how powerful GitHub Pages are.

Thank you @spraints for starting your incredible work on this, it gave me quite the head start!

This is where I'm at right now:

Note: This is just a start. There is still quite a bit to be done.

Plenty of things still to do 😉

peff commented 1 year ago

Just a few thoughts, as somebody who has thought about this off and on in the intervening 5 years:

Just my two cents, of course. I haven't looked carefully at the problem in a while.

dscho commented 1 year ago
  • we'll probably want custom scripts

Yes, we will. Things like importing/pre-generating the ProGit book cannot be done by Jekyll.

What can be done by Jekyll is to take generated .html files and integrate them into the look-and-feel of git-scm.com.

ttaylorr commented 1 year ago
  • I don't think we strictly need scheduled jobs to do updates for things like manpage imports and updating the downloads list.

I agree, but it would be very nice to have. I vaguely recalled that I thought GitHub makes this somewhat easy to do, and indeed, we can schedule a workflow to run every night (or any any schedule) here. Such workflows can also be triggered by "pressing a button", which is convenient to do at release time.

In fact, it would be really nice to have those workflows automatically get kicked off whenever new tag(s) are pushed to git/git. But I don't know if such cross-repo monitoring is possible.

sxlijin commented 1 year ago

In fact, it would be really nice to have those workflows automatically get kicked off whenever new tag(s) are pushed to git/git. But I don't know if such cross-repo monitoring is possible.

I think you'd have to do it by setting up a workflow on git/git that triggers a workflow here; I'm not aware of any mechanism to subscribe to a different repo and I don't see one by skimming the workflow triggers docs.

It would be pretty easy to schedule an hourly or nightly job on git-scm to check for tag pushes to git/git though.

jnavila commented 1 year ago

I guess the guts of the processing of sources of progit and the manpages can be kept the same. One roadblock, however, is that instead of feeding a DB, we want to spit files, add content, commit and push, while being in a github workflow. Not sure, this is easy...

The other issue as @dscho pointed out, is that unlike progit, the git manpages are versioned, at least in English, and need to be flatten in the filesystem (instead of relying on the hash in a DB).

dscho commented 1 year ago

But I don't know if such cross-repo monitoring is possible.

It's not possible right now with GitHub workflows as-are, not unless you convince the Git maintainer to integrate a workflow whose sole beneficiary is git-scm.com.

However, there is a very easy way to do this: register a webhook with git/git pointing to an Azure Function that uses a PAT to triggers the workflow_dispatch workflow in git-scm.com. It's no wizardry, really, it is very similar to what we do with GitGitGadget (with minor variations, it's not a webhook but a GitHub App, and it's not a GitHub workflow but an Azure Pipeline that is triggered).

I guess the guts of the processing of sources of progit and the manpages can be kept the same.

That matches my understanding.

One roadblock, however, is that instead of feeding a DB, we want to spit files, add content, commit and push, while being in a github workflow. Not sure, this is easy...

It is actually very easy. All you need to do is to mark the workflow in question as being permitted to write contents.

the git manpages are versioned, at least in English, and need to be flatten in the filesystem (instead of relying on the hash in a DB).

They do not really need to be flattened, but we do need to put them into the correct locations. As pointed out in my earlier comment, the URLs for the versioned manual pages add a suffix /<version> to https://git-scm.com/docs/<command>. What is tricky about this is that docs/git-config is kind of a file, but also kind of a directory, in the Rails app. We will need to figure out how to handle this in Jekyll (it might be necessary to write the current version into docs/git-config/index.html and have docs/git-config auto-resolve to that file, somehow).

peff commented 1 year ago

My biggest wish for the import/build system is that I be able to run it locally, and that the results be available in the filesystem for local inspection. And ideally also intermediate results, like manpages that have been rendered via asciidoctor but not yet Jekyll-ified (or whatever template / presentation system we use).

The reason is that most of our bugs (including the one I introduced a few days ago!) come from refactoring or fixing a problem in the import code, which have unforeseen effects. I.e., they are things that could be caught easily with a simple diff of the results before and after the change, but our current build flow makes that really painful to do.

I suspect everyone is on board with that direction, and hopefully it just falls out naturally from any static-site build plan, but that was what I was trying to get at earlier.

dscho commented 1 year ago

@spraints you will be delighted to see that the manual pages now render, too, see e.g. https://dscho.github.io/git-scm.com/docs/git-config (note that I haven't yet taken care of translated manual pages, that's why there is a funny "<%= render 'doc/languages' %>", still).

The way I did it is similar to what I did with the ProGit book at https://dscho.github.io/git-scm.com/book/en/v2/: I committed the generated manual pages with appropriate front matter. For the manual pages' versions, I made liberal use of the redirect_from Jekyll extension to make things like /docs/git-config/2.39.2 work. And _data/docs.yml contains a YML-ified kind of a database, and the expanded AsciiDoc sources are stored in _generated-asciidoc/ with their SHA-1 as filename.

It's admittedly a bit excessive: 152 distinct Git versions are included in total, and git-config being the winner with the most per-page versions: there are 80 (in words: eighty) distinct versions of the git-config page.

This excessiveness comes at a price: before including the manual pages (and rendering only the English version of the ProGit book), the Pages build took ~1½ minutes and produced a ~65MB artifact. With the many versions of the English manual pages, the build takes ~6 minutes and produces a 420MB artifact.

While this would allow us to reach 1:1 parity with the Rails app, I guess we can easily declare including this many versions insanity and limit the number of versions we serve on git-scm.com to a reasonable number, and skip the minor versions (i.e. for each major version take only the latest minor version, as is already done for versions older than v2.17.0). That should make things more manageable again.

@spraints thank you for starting this all!

dscho commented 11 months ago

Update: I've converted everything to a Hugo site, and there are now only very few loose ends to tie up. The draft PR can be adored here: https://github.com/git/git-scm.com/pull/1804

dscho commented 6 days ago

@spraints would you agree that we can close this here ticket in favor of 1804?