EmicoEcommerce / Magento2Tweakwise-archived

Magento 2 module for Tweakwise integration
Other
9 stars 25 forks source link

Whitelisting SEO URL's doesn't work #103

Closed Alex-MaxServ closed 4 years ago

Alex-MaxServ commented 4 years ago

Issue Brief

I'm trying to whitelist some filters to be indexed by Google. It seems the filter whitelist is ignored, no differences are visible between whiltelisted filters and others.

Environment

PHP Version: 7.1 Magento Version: 2.2.10 Tweakwise Version: 2.1.5 Tweakwise Export Version: 1.5.0 Magento Deploy Mode: development

Steps to reproduce

  1. Install the Magento2Tweakwise module
  2. Set Seo enabled = Yes on the Catalog - SEO page in Magento2 image
  3. Select some filters to whitelist
  4. Check the url while hovering over the different filters

Actual result

Hovering a whitelisted filter results in an URL with /#: image

Expected result

Whitelist filtered URL's must show the URL with filter options, in stead of the /#. I miss the /tuinsets/filtername/filteroption (for example: /tuinsets/zitplaatsen/2) This affects all filters.

edwinljacobs commented 4 years ago

This is by design. I checked the readme and the explanation of the settings max allowed facets and filter whitelist was sadly incorrect, shame on me :(

The reason is as follows: Filters are indexable if and only if they are in the whitelist and the selected filter count does not go above max_allowed_facets. The reason this is an AND check is because otherwise indexation will still happen on the non whitelisted filters since it is unclear which url is present in the filter (an arbitrary amount of filters could be selected beforehand).

Suppose max allowed facet is 1 and only "size" is in the whitelist. Then filter "color" with value "red" is not indexable (since "color" is not in the whitelist). If we now allow the size filter to still be indexable then url example.com/category/color/red/size/M would be indexable whereas example.com/category/color/red is not which is incorrect. This would lead to infinite crawling on filter urls which is undesirable.

I have updated the readme with this explanation. If this is an issue for your project then I would advise to either make a patch on 'src/Model/Seo/FilterHelper.php' and include that patch via for example vaimo/composer patches or make a plugin which overrides this behavior.

The function which is responsible for checking if a filter is indexable is found in src/Model/Seo/FilterHelper.php lines 55 to 70. I hope this explanation suffices.

jkekkel commented 4 years ago

@edwinljacobs Thanks for the feedback. I'm the project lead for this ticket on the client side. What I read in the readme on Github is

Seo (All settings depend on Enabled having value yes)

  1. Enabled: use Seo options yes or no.
  2. Filter whitelist: A list of filters which should be indexable. If a filter is marked as not indexable then its href attribute will be set to "#" its original url will be set in a data-seo-href attribute which will be used by javascript to navigate. Note that the category filter is always marked as indexable.
  3. Max allowed facets: This combines with the Filter whitelist setting. Filters are indexable if and only if they are in the whitelist and the selected filter count does not go above max_allowed_facets. The reason this is an AND check is because otherwise indexation will still happen on the non whitelisted filters and it is unclear which url is present (an arbitrary amount of filters could be selected). Suppose max allowed facet is 1 and only "size" is in the whitelist. Then filter "color" with value "red" is not indexable (since "color" is not in the whitelist). If we now allow the size filter to still be indexable then url example.com/category/color/red/size/M would be indexable whereas example.com/category/color/red is not which is incorrect. This would lead to infinite crawling on filter urls which is undesirable

With other words: "href" should be filled if the filter is whitelisted. This isn't the case in our situation. If we whitelist we see a # in the href, and if we don't whitelist we also see a # in the href.

Next to that; Whether we whitelist a filter or not, the url always has a NOINDEX,FOLLOW What we should see is:

edwinljacobs commented 4 years ago

Hi,

Can you tell me what the value is in max allowed facets field so that I can reproduce your issue?

jkekkel commented 4 years ago

The value in the max allowed facets field = 2

edwinljacobs commented 4 years ago

Thanks,

Ill look into it

edwinljacobs commented 4 years ago

Hi,

With value 2 for max allowed facets and only one filter whitelisted (size) I can partially reproduce your issue. I find that if I select the whitelisted filter then my robots value stays as is (in my case that is index follow) however If I select another size filter (so M and XL for example) my robots value goes to noindex,follow. The reason behind this is that the selected category also counts as a active filter. This seems incorrect and has been fixed (see branch issue/103-filterwhitelisting).

However this does not explain your case where you select any whitelisted filter and your robots value changes to noindex. I assume that in your explanation you included the case "Go to category page (no filter selected yet), select whitelisted filter, check robots value and find that it is incorrect".

Perhaps some explanation on how this should work is in order. When rendering robots, we check the selected filters, if one of these filters is deemed not indexable (explained below) we alter the index/noindex part of robots. By altering we mean check if it is already noindex if so do nothing otherwise change it to noindex, the input value for this process is the original magento robots as configured in the backend under content > Design configuration > store > search engine robots or something else if you have plugins on the Magento\Framework\View\Page\Config getRobots method. So if that setting is already on noindex for whatever reason we do nothing.

By not indexable we mean seo enabled and the filter is not a category filter and we are above the max_allowed_facet setting or the filter is not whitelisted. While rendering filters we do this exact same check to determine if that filter should be rendered with a href or with data-seo-href. As mentioned above the current category was counted as an active filter (the tweakwise navigation response exposes the currently selected filters and category is one of those filters) for the purposes of max_allowed_facets we excluded the category filter from the count

The relevant code can be found in Model/Seo/FilterHelper.php and Model/Seo/Robots/Plugin.php

Maybe this fixes your issue?

With kind regards.

edwinljacobs commented 4 years ago

Hi,

It turns out that derived properties in the navigator are not facilitated in the whitelist, sadly we missed that in the implementation. After some consideration we decided to change the whitelist field to a comma separated text field, reason being that the values of a multiselect field are saved as comma separated text (so this is an easy non breaking change). In order to whitelist the derived property one needs to add its url-name (see image) to the whitelist. This will be added to the next release.

derived-propery-url-name

If at some point tweakwise provides a service where we can query all properties we will use that instead as comma separated text is error sensitive compared to a multiselect.

With kind regards.

edwinljacobs commented 4 years ago

Hello,

This has been included in release https://github.com/EmicoEcommerce/Magento2Tweakwise/releases/tag/v2.1.7

With kind regards