baldwin-agency / magento2-module-url-data-integrity-checker

Magento 2 module which can find potential url related problems in your catalog data
MIT License
261 stars 28 forks source link

catalog:product:integrity:urlkey shows false positives #35

Closed peterjaap closed 2 years ago

peterjaap commented 2 years ago

When I run bin/magento catalog:product:integrity:urlkey these are the results;

+------------+-----------------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+        
| Product ID | SKU             | Store ID | Problem                                                                                                                                                        |        
+------------+-----------------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 300        | KG101           | 1        | Product has a duplicated url_key value (playgrass). It's the same as another product (ID: 346, Store: 1). Please fix because this will cause problems.         |
| 300        | KG101           | 14       | Product has a duplicated url_key value (playgrass). It's the same as another product (ID: 346, Store: 14). Please fix because this will cause problems.        |
| 346        | DE115           | 1        | Product has a duplicated url_key value (playgrass). It's the same as another product (ID: 300, Store: 1). Please fix because this will cause problems.         |
| 346        | DE115           | 14       | Product has a duplicated url_key value (playgrass). It's the same as another product (ID: 300, Store: 14). Please fix because this will cause problems.        |
+------------+-----------------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

When I check my database using this query, I get the following results;

SELECT t.* FROM catalog_product_entity_varchar t WHERE attribute_id = (SELECT attribute_id FROM eav_attribute WHERE attribute_code = 'url_key' AND entity_type_id = 4) and entity_id IN (300,346)

value_id attribute_id store_id entity_id value
19165 86 0 300 playgrass-KG101
3761705 86 1 300 playgrass
3761706 86 14 300 playgrass
26872 86 0 346 playgrass-DE115
3761707 86 1 346 playgrass
3640255 86 14 346 playgrass

I then remove all url_rewrites for these products and regenerate them using our regenerate catalog URL extension;

bin/magento regenerate:product:url -s all 300 346

These are the results;

url_rewrite_id entity_type entity_id request_path target_path redirect_type store_id description is_autogenerated metadata
2 product 300 playgrass catalog/product/view/id/300 0 1 NULL 1 NULL
3 product 346 playgrass catalog/product/view/id/346 0 14 NULL 1 NULL

Which is perfectly fine, no URL collissions! Therefore I believe this extension returns false positives.

hostep commented 2 years ago

Hmm, and what are the generated url rewrites for:

peterjaap commented 2 years ago

@hostep none since those products are not present in the website that store view is attached to

hostep commented 2 years ago

Aaaah okay, so store_id 1 is assigned to a different website then store_id 14. And product with entity_id 300 is only assigned to that first website and product with entity_id 346 is only assigned to the second website?

In that case, you are probably right and we shouldn't report this as an error. I'll see if I can create a case to reproduce this and see if we can fix this.

However, I'm wondering how those url_key's got added to the database for those store_id's? I'm going to guess that both products were first assigned to both websites, and later got unassigned from one of both websites? If you would remove the url_key values for the products assigned to a store_id which belongs to a website that the product is no longer assigned to, then I think you can also get rid of these false positives.

peterjaap commented 2 years ago

@hostep yes that's correct. The url keys got in the database because of a migration from M1 to M2. There's no validation at that point and M1 allowed this. We come across issues like this often with migrated shops. Garbage out, garbage in.

I'll try your latest suggestion about removing those keys once I'm back in the office, after my holiday this week :)

peterjaap commented 2 years ago

@hostep you're right, that fixed the integrity errors. I've deleted them by running these queries;

DELETE FROM catalog_product_entity_varchar WHERE attribute_id = 86 AND store_id = 1 AND entity_id NOT IN (SELECT product_id FROM catalog_product_website WHERE website_id = 1)
DELETE FROM catalog_product_entity_varchar WHERE attribute_id = 86 AND store_id = 14 AND entity_id NOT IN (SELECT product_id FROM catalog_product_website WHERE website_id = 3)

Thanks!