Open marekcierny opened 8 years ago
Ultimately, I think C) meta "canonical" should be added to every page to resolve any potential duplicate content we might miss... (E.g. tracking campaigns and traffic sources)
I wrote a simple PHP function that rewrites any url into a "canonical url". canonical.TXT
If "echo get_canonical_meta($url)" can be added into every page
, it can help us explain to search engines our duplicite content.Unfortunately, the application is written in Python, so we can not include your script into every page view directly. On the other hand, I assume we are able to rewrite it into Python (@slaweet?)
I added canonical urls (https://github.com/adaptive-learning/anatomy/commit/f9a74540f5909696570687e7e6145c312b413bd1).
I'm just stripping query string (everything after ?
). I changed /overview/?tab=location
to /overview/tab/location
because of it.
I didn't implement the part with changing domain in canonical, because it wouldn't get ever executed, because the 301 redirect gets executed first and then we are on the correct domain.
As for disallowing /en/ and /cs/ I removed it from robots.txt, but I don't see why it should influence page rank of any other page then the ones with /en/ and /cs/, which we don't want in search results anyway. And IMO we don't want Google to see the redirect, but directly the alternative language version through <link rel="alternate" ...
OK. As for disallowing /en/ and /cs/ in robots: http://webmasters.stackexchange.com/questions/54240/is-it-safe-to-block-redirected-but-still-linked-urls-with-robots-txt (In general, my understanding is dissallowing robots to any url we link to within our site is not good.)
The canonical form of the url is also related to <link rel="alternate"
sitemap: only canonical forms of urls should be linked as another language version.
For example, on https://anatom.cz/practice//, the canonical url is https://anatom.cz/practice/, and the alternate languagesshould also end onlz with one /.
I've updated <link rel="alternate"
(https://github.com/adaptive-learning/anatomy/commit/1d33303af3718a526b0f67a16b8def5436faafcf), even though I don't think it matters what is on the non-canonical pages, as Google is only going to look at (index) the canonical ones.
I've also added '//' -> '/' replacement to canonical url.
Thank you, Víťo. Do you use www.google.com/webmasters/tools/ to check for SEO warnings/errors? (I think it's a great tool, especially as we want to ad more languages and content in the future.) I've just noticed that when logged in, the view-source:https://anatom.cz/ shows canonical address "https://anatom.cz/overview/". But when logged off, it's correct.
I might be too picky, but other potential duplicate content is
view-source:https://anatom.cz/ for logged in users actually redirects to view-source:https://anatom.cz/overview (notice address bar). Hopefully, search engines cannot log in :-)
I use www.google.com/webmasters/tools/ every now and then, I haven't noticed any SEO warnings or errors there. I've linked Webmaster tools with GA, so it probably displays the errors in GA as well.
Ad 4 and 5: I see the problem, I'll have to think about how to solve it technically.
Although there is no link to such a page, not sure if this could be problem for search engines or users/brand/security: https://anatom.cz/overview/V%C3%ADt%C3%A1%20v%C3%A1s%20blbe%C4%8Dek https://anatom.cz/view/02/V%C3%ADt%C3%A1%20v%C3%A1s%20blbe%C4%8Dek (random url parameter is recognized as canonical, and the random text is displayed in heading)
Re https://github.com/adaptive-learning/anatomy/issues/19#issuecomment-171920436: Good catch. That URL is actually a link to view knowledge of a user, e.g. https://anatom.cz/overview/slaweet https://anatom.cz/overview/cierny.m
The problem is that we don't do the check if the given string is a valid username. If not, then the page should return an error.
Víťo, when I suggested to make a separate url for /overview/?tab=location in order to get the crawler see our main content tree, I didn't know that google can understand AJAX. Now I think it wasn't a good idea from the start, and we might be better without it. I am sorry to make it complicated.
Marku, I don't think Google AJAX crawling scheme is applicable here. Anything we want to appear in search results (like /overview/?tab=location) has to be on a separate url.
And FYI, your example with "Vítá vás blbeček" has been indexed by google as Google crawled our Github :-) FYI no.2 the problem with SEO in GA was just reporting issue and was caused by http -> https migration in December. Our impressions changed to https vesion of anatom.cz and those were not listed.
First, I am concerned we have very similar content (and identical ) when user view in image under different chapters/body parts (eg. practiceanatomy.com/view/UE/image/casti-lidskeho-telasvg and practiceanatomy.com/view/LE/image/casti-lidskeho-telasvg). Can we change the url to practiceanatomy.com/view/LE/#image/casti-lidskeho-telasvg or practiceanatomy.com/view/LE/#image/5 ?
Second, I've found a simple SEO guide, and there are several things we do not do yet:
Several examples of potential duplicite content exist:
Duplicite content should be a) avoided if possible, b) resolved by redirect 301, or C) resolved by <link rel="canonical" (https://support.google.com/webmasters/answer/139066).