danpros / htmly

Simple and fast databaseless PHP blogging platform, and Flat-File CMS
https://www.htmly.com
GNU General Public License v2.0
1.08k stars 262 forks source link

[BUG REPORT][FEATURE REQUEST] canonical url on paginated pages when blog URL enabled #800

Closed Joduai closed 1 month ago

Joduai commented 2 months ago

Today I did get a notice from google search console, that other pages have the same canonical url. In examples I saw a https://website/blog?page=1 url, and after looking into page source I saw that this paginated page has the same url as the /blog:

<link rel="canonical" href="https://website/blog />

taking from https://developers.google.com/search/docs/specialty/ecommerce/pagination-and-incremental-page-loading#use-urls-correctly Don't use the first page of a paginated sequence as the canonical page. Instead, give each page in its own canonical URL.

I've chosen to add to meta tag ?page=n in url when there are more paginated pages in htmly.php within

// Show various page (top-level), admin, login, sitemap, static page.
get('/:static', function ($static) {
....
} elseif ($static === 'blog') {
....
        if ($page > 1) {
            $bCanonicalPageNum = "?page=".from($_GET, 'page');
        } else $bCanonicalPageNum = NULL;

        render($pview, array(
            'title' => generate_title('is_blog', null),
            'description' => blog_title() . ' Blog',
            'canonical' => site_url() . 'blog' . $bCanonicalPageNum,
            'metatags' => generate_meta('is_blog', null),
            'page' => $page,
            'posts' => $posts,
            'bodyclass' => 'in-blog',
            'breadcrumb' => '<a href="' . site_url() . '">' . config('breadcrumb.home') . '</a> &#187; Blog',
            'pagination' => has_pagination($total, $perpage, $page),
            'is_blog' => true
        ), $layout);

I'm not sure if this approach solves the problem without raising another, as the /blog?page=1 does have the same canonical url as /blog. On the other hand first page with blog posts when I come back from other paginated pages has the same content, thou it shouldn't have other canonical url as the /blog.

To be honest I don't even care about G's and its SEO practies, but I thought I share this for those who believe G's SERPs is a serious business ;)

This for the [BUG REPORT]

As for [feature request]:

  1. shouldn't first pagination page url get rid of /blog?page=1 and instead point to /blog when with "enabled blog url" option?
  2. shouldn't paginated pages also get friendly urls in one of the manners: /blog/2, blog/page/2, blog/page-2 etc.? "page" string could have a config var like "Read more text" or be taken from i18n language files.
danpros commented 2 months ago

Hello,

You should doing PR to change it, at least those canonical code. For friendly URLs we will keep like current format is fine.

Thanks, I will look into it. We need to change it across many index pages (tag, category etc.)

if ($page > 1) {
    $CanonicalPageNum = '?page=' . $page;
} else {
    $CanonicalPageNum = NULL;
}
Joduai commented 2 months ago

Ohh, I didn't even check if $page var is accessiible outside of get('/index', function () { . Assumed it wasn't, I used from() Nice & clean approach :)

As for categories, tags etcetera I don't have enough data on my small blog. From the other hand it's just a matter of time when G search console would come up with such notices for other sections when pagination occurs, thus I would then look deeper into it.