git / git-scm.com

The git-scm.com website. Note that this repository is only for the website; issues with git itself should go to https://git-scm.com/community.
https://git-scm.com/
MIT License
2.19k stars 1.24k forks source link

Lack of Google results for manual pages #252

Closed rtomayko closed 11 years ago

rtomayko commented 11 years ago

I don't have a good sense for why this is the case but I'm surprised by the lack of results I get for git-scm.com when searching for manual page names. For example:

The git-scm.com URLs for these match the search and the <h1> also matches. Seems like we should be seeing these given that other git-scm.com domain results are coming back with high rankings.

Is it possible that we're doing something that's causing these pages to get blacklisted in google? Like rendering different versions to Googlebot somehow or doing something fancy with js that'd hide the content to non-browser UAs?

rtomayko commented 11 years ago

I don't get it.

randomecho commented 11 years ago

In the equivalent of a rough straw poll, running the URLs through Stack Overflow (which puts nofollow on external links) shows the Git Bisect page gets linked more than the blame man page.

5 vs 2

Wildly extrapolate that on blogs and other guides that link to the kernal.org man page for blame instead and it just may be a situation outside of your control.

randomecho commented 11 years ago

It also looks like the lack of better worded <title> tags as mentioned in #151 doesn't help things.

The man page is

Git

whereas the debugging page that is leapfrogging here has

Git - Debugging with Git

And it was addressed back in #201 as well.

fingolfin commented 11 years ago

Yeah, #151 (and its duplicate #201) were fixed quite a long time ago. But these, just like many fixes to the AsciiDoc conversion, require the database to be rebuilt in order to take effect. Doing that would indeed resolve about 10 issues reported here, see my meta-issue #241 -- sadly, so far nobody with write access to gitscm-next so far has even acknowledged this problem publicly, not to even mention fixing it :-(...

schacon commented 11 years ago

Yes, as I believe I've said to Max before, I've been working on this on and off. Basically to rebuild the database I need to pull down a copy locally, rebuild it and push it back up. I need to figure out a way to do this easier, but for now this is the only way that won't cause regressions in other parts of the site, specifically the book. I actually worked on this for a bit today, but as with other times, I got caught up in other stuff.

There are also some caching issues that are affecting this, including the title issue, I believe - I don't think rebuilding the db will fix that, which is what you're actually worried about here. I haven't had time to dig into heroku caching details or how we're doing it wrong.

schacon commented 11 years ago

The titles should be generated properly now. Let's see if that helps things.

fingolfin commented 11 years ago

@schacon If you told me that before, then I am sorry but I must have totally missed it in the past months... :-). Aanyway. I don't want to sound ungrateful, to the contrary, my intents are good, and I am very grateful that you now found some time and that things run a lot smoother now, at least for the time being. Thanks!

And, thanks for the explanation. Caching definitely seems to be part of the issue. In my local gitscm instance, I also started to see stale data, and had to delete tmp/cache before restarting rails server to make things work. So it seems to be a general issue with how RoR caches stuff (but then I know very little about RoR, so this might be again quite off).

Sadly, titles for most references pages still are "git", something that works just fine my local gitscm instance. Interestingly, though, http://git-scm.com/docs/git-stage/1.8.1 shows a "good" page title", while http://git-scm.com/docs/git-stage does not. Nor does for example http://git-scm.com/docs/git-stage/1.7.12

fingolfin commented 11 years ago

BTW, does anybody know who maintains http://www.kernel.org/pub/software/scm/git/docs/ ? Last update was May 2012. I tried to contact the kernel.org staff to find out, but got no response.

fingolfin commented 11 years ago

Right now I see the page title "Git" everywhere. So this issue is not resolved. :-(