AlfredoRamos / phpbb-ext-seo-metadata

SEO Metadata extension for phpBB
GNU General Public License v2.0
11 stars 7 forks source link

Prefix description with subject if body length <25 #87

Closed MichaIng closed 2 years ago

MichaIng commented 2 years ago

The post body text can be very short, in which case SEO recommendations may not be met, recommending description meta tag lengths of at least 25 characters. This change prefixes the description with the post subject, in case the body text is less than 25 characters long.

MichaIng commented 2 years ago

What is not 100% clean is that post_text is within <t>...</t> tags, the post_subject is not.

AlfredoRamos commented 2 years ago

Hi :wave:

Thanks for your contribution, I'll review the changes and give you feed back.

AlfredoRamos commented 2 years ago

However, probably there is a cleaner way to add the subject inside the tags, so that XML syntax remains valid.

It could be done using DOM, so the title could be inserted inside the <t> and </t> tag.

MichaIng commented 2 years ago

You mean at a later stage, when applying the strings to the final meta tags? The title is always the threads subject, in case prefixed with the post number, but not the posts subject. I though the posts subject would suite better since it can be freely chosen, hence can be completely different than the threads title and would then be more related to the posts body it prefixes directly. But I'm open to adjust the PR in any way. Major aim is to silence Bing webmaster SEO issues in case of short posts 🙂.

As alternative I can also try split $description in a generic way after any leading tag, or using the regex ^<[rt][ >] you use in clean_desription(). I'm just not sure whether it's worth it, given that the result is correct already.

AlfredoRamos commented 2 years ago

You mean at a later stage, when applying the strings to the final meta tags?

No, in the same way you are using it right now, however instead of concatenating a string (subject + description), it would be added directly inside the description XML using DOM.

The title is always the threads subject, in case prefixed with the post number, but not the posts subject.

My bad, I meant to say subject instead of title. Because I agree that the title has some text we don't need.

Major aim is to silence Bing webmaster SEO issues in case of short posts

And I like the idea, so this is definitely something I want to add.

As alternative I can also try split $description in a generic way after any leading tag, or using the regex ^<[rt][ >] you use in clean_desription().

Sure, I suggested DOM because, in my opinion, that's the way to manipulate XML. I'm trying to use regex as less as possible because it can get messy really fast.

However you have creative freedom to decide which method is best :+1:

I'm just not sure whether it's worth it, given that the result is correct already.

I would prefer to do it "the right way" to avoid problems with nested <t> tags with other extension.

I'm aware of at least 2 extension (not counting my own) that extends this one, so I try to avoid creating a conflict somehow.

MichaIng commented 2 years ago

Ah okay, sorry for the misunderstanding, since I'm no professional PHP developer, I don't know yet how to "use DOM" here 😅, but it sounds like the best solution indeed. If you have a hint, let me know, in the meantime I'll try to find some docs to figure this out myself 🙂.

MichaIng commented 2 years ago

I think I found it: https://www.php.net/manual/en/domcharacterdata.insertdata.php

Will try it out locally and in case update the PR tomorrow.

AlfredoRamos commented 2 years ago

I don't know yet how to "use DOM" here :sweat_smile:, but it sounds like the best solution indeed. If you have a hint, let me know

Oh.

A very easy way to do it is to create a new DOMText node using DOMDocument::createTextNode() and then add it to the root node of the XML (either <r> or <t>) using the DOMNode::insertBefore() method.

You could do something like (not tested):


$description = $row['post_text'];

// Load the XML document
$dom = new \DOMDocument;
$dom->loadXML($description);

// Prepend the post subject in the XML of the post body
$dom->documentElement->insertBefore(
    $dom->createTextNode($row['post_subject'] . ' '),
    $dom->documentElement->firstChild
);

// Save modified XML
$description = $dom->saveXML($dom->documentElement);
MichaIng commented 2 years ago

Works very well. I implemented it as you suggested.

AlfredoRamos commented 2 years ago

Sorry for the dealy, I've been dealing with some issues.

But such short post texts are very rare, and probably selecting both values marginally decreases performance (increases load) for all other post requests?

In my test the difference is insignificant (~ 0.012 ms on each), so don't worry about that.

Besides, if my guesses are correct, I might get a optimization validation note for that second SQL query.

MichaIng commented 2 years ago

Besides, if my guesses are correct, I might get a optimization validation note for that second SQL query.

You mean by this, a not from CI/CD pipeline to optimise this? Ready to be merged from my end btw, I have this deployed on our phpBB instance since the last push and reviewed a few short posts/threads where worked as intended while not touching longer ones.

AlfredoRamos commented 2 years ago

You mean by this, a not from CI/CD pipeline to optimise this?

No, the validation team reviews the code and make notes on it.

Some notes might get the extension to be rejected, so that's why I'll change that second query.

Ready to be merged from my end btw

Ok.

I'll merge it as is and make some changes to it.

Thanks! :+1:

MichaIng commented 2 years ago

Ah wow, I didn't know about this. Nice on the one hand, but otherwise it may be annoying 😄. I was wondering why so many apps are not provided via official extensions database, and now I start to think that strict reviews may be it. On the plus side is that that way we can better trust apps in the extensions database and the same way may less trust apps not there, since probably they are not well coded and therefore were rejected 😛.