OpenMage / magento-lts

Official OpenMage LTS codebase | Migrate easily from Magento Community Edition in minutes! Download the source code for free or contribute to OpenMage LTS | Security vulnerability patches, bug fixes, performance improvements and more.
https://www.openmage.org
Open Software License 3.0
868 stars 436 forks source link

Remove duplicate and unneeded database indices #2992

Open loekvangool opened 1 year ago

loekvangool commented 1 year ago

pt-duplicate-key-checker is a tool by Percona that can find redundant database indexes. A blog about this tool is available.

It reduced the size of our database with about 10%, and it speeds up writing. Most of those optimizations seem to apply to any OpenMage installation, but some of them are from 3rd party modules.

Some of the duplicates apply to InnoDB only, because InnoDB always includes the PRIMARY column in any B-tree index, it is therefore not necessary to add the PRIMARY column to any other index.

Another common occurence is that there is an index (A), as well as an index (A,B). In that case, (A) is not necessary, (A,B) satisfies it*.

* But, not (B, A). The sequence matters, hence the term "left-prefix of"

Sample output from pt-duplicate-key-checker:

# IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ENTITY_ID is a left-prefix of UNQ_CAT_PRD_ENTT_DEC_ENTT_ID_ATTR_ID_STORE_ID
# Key definitions:
#   KEY `IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ENTITY_ID` (`entity_id`),
#   UNIQUE KEY `UNQ_CAT_PRD_ENTT_DEC_ENTT_ID_ATTR_ID_STORE_ID` (`entity_id`,`attribute_id`,`store_id`),
# Column types:
#         `entity_id` int(10) unsigned not null default 0 comment 'entity id'
#         `attribute_id` smallint(5) unsigned not null default 0 comment 'attribute id'
#         `store_id` smallint(5) unsigned not null default 0 comment 'store id'
# To remove this duplicate index, execute:
ALTER TABLE `catalog_product_entity_decimal` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ENTITY_ID`;

There are dozens of these recommendations. One discussion point is that most apply to InnoDB and not MyISAM.

loekvangool commented 1 year ago

Can we assume InnoDB for large-scale deployments going forward? Removing these indices will not break MyISAM tables but it will reduce performance if they are used. But for performance issues to matter, the deployment has to be large and/or busy to begin with. We can probably assume InnoDB for any case in which indices matter. And otherwise people should convert to InnoDB. Maybe a doc change is needed first?

justinbeaty commented 1 year ago

Can we check for innoDB in the upgrade script that will drop the indexes? I’m not sure if Magento has done anything like that in the past.

loekvangool commented 1 year ago

Just speaking from theory, I never made upgrade scripts, but it looks like we can check for InnoDB (it's just PHP code, anything can be done) before removing the indices. However, there can be no interaction with the user, so it would mean just skipping the dropping of indices based on the mysql engine used. And then we would remove the definition of the indices so they are not created for new installs, which cannot be based on the engine used afaik. And we cannot track an engine change by the user and start removing/adding indices based on that change.

I think the bottom line is: we would have to be comfortable with just going for it. Add the recommendation to the docs and assume InnoDB going forward. And no MyISAM site would break from this, it would just get slow(er).

fballiano commented 1 year ago

I’d check the engine type for the table and drop the index only for innodb. I wouldn’t care about future engine change, if you switch engines you can also check if you need indexes 😉

loekvangool commented 1 year ago

Yup assumptions have to be made otherwise it gets too complicated.

fballiano commented 1 year ago

i've run it against a vanilla openmage installation:

openmage duplicate indexes.txt

justinbeaty commented 1 year ago

And then we would remove the definition of the indices so they are not created for new installs, which cannot be based on the engine used afaik

Would it be better to not change the old install scripts, and just have the new install script drop the indices (if innoDB === true)?

loekvangool commented 1 year ago

Would it be better to not change the old install scripts, and just have the new install script drop the indices (if innoDB === true)?

I'm no expert on this, happy to go with the flow.

Flyingmana commented 1 year ago

I would say we can expect InnoDB, its standard by now, and we can also make it a new hard requirement for future versions. I would also prefer to only modify the install scripts to not create them anymore, and leave the update part as a possible manual performance optimization. (something we could put into our documentation, maybe under an own performance category)

also what I found, according to https://stackoverflow.com/a/68672584/716029 the compound index is only useful as replacement for a single index, if its the first part of it. Thats something we should keep in mind.

loekvangool commented 1 year ago

also what I found, according to https://stackoverflow.com/a/68672584/716029 the compound index is only useful as replacement for a single index, if its the first part of it. Thats something we should keep in mind.

Yes the sequence matters, hence the term "left-prefix of". pt-duplicate-key-checker is keeping this in mind as well.

elidrissidev commented 1 year ago

Great find! Actually I was just learning about this exact topic this week, so it's a good coincidence :)

loekvangool commented 1 year ago

Thanks. Feel free to help out, to be brutally honest it would take me too much time to implement these changes. I'll be contributing in other PRs this is just too close to the core to be comfortable.

For clarity: I executed 100% of the recommendations of the tool, after carefully checking and understanding the issues, and monitored performance and database size.

elidrissidev commented 1 year ago

It's fairly easy with @fballiano's script. I manually checked a few of them and they totally make sense. I will create a PR tomorrow.

fballiano commented 1 year ago

this should be the full query list:

ALTER TABLE `api2_acl_attribute` DROP INDEX `IDX_API2_ACL_ATTRIBUTE_USER_TYPE`;
ALTER TABLE `catalog_category_entity` DROP INDEX `IDX_CATALOG_CATEGORY_ENTITY_PATH_ENTITY_ID`, ADD INDEX `IDX_CATALOG_CATEGORY_ENTITY_PATH_ENTITY_ID` (`path`);
ALTER TABLE `catalog_compare_item` DROP INDEX `IDX_CATALOG_COMPARE_ITEM_CUSTOMER_ID`;
ALTER TABLE `catalog_product_entity_datetime` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_DATETIME_ENTITY_ID`;
ALTER TABLE `catalog_product_entity_decimal` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ENTITY_ID`;
ALTER TABLE `catalog_product_entity_group_price` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_GROUP_PRICE_ENTITY_ID`;
ALTER TABLE `catalog_product_entity_int` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID`;
ALTER TABLE `catalog_product_entity_text` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_TEXT_ENTITY_ID`;
ALTER TABLE `catalog_product_entity_tier_price` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_TIER_PRICE_ENTITY_ID`;
ALTER TABLE `catalog_product_entity_varchar` DROP INDEX `IDX_CATALOG_PRODUCT_ENTITY_VARCHAR_ENTITY_ID`;
ALTER TABLE `catalog_product_index_eav` DROP INDEX `IDX_CATALOG_PRODUCT_INDEX_EAV_ENTITY_ID`;
ALTER TABLE `catalog_product_index_eav_decimal` DROP INDEX `IDX_CATALOG_PRODUCT_INDEX_EAV_DECIMAL_ENTITY_ID`;
ALTER TABLE `catalog_product_index_eav_decimal_idx` DROP INDEX `IDX_CATALOG_PRODUCT_INDEX_EAV_DECIMAL_IDX_ENTITY_ID`;
ALTER TABLE `catalog_product_index_eav_idx` DROP INDEX `IDX_CATALOG_PRODUCT_INDEX_EAV_IDX_ENTITY_ID`;
ALTER TABLE `catalog_product_index_price` DROP INDEX `IDX_CATALOG_PRODUCT_INDEX_PRICE_WEBSITE_ID`;
ALTER TABLE `catalog_product_link` DROP INDEX `IDX_CATALOG_PRODUCT_LINK_LINK_TYPE_ID`;
ALTER TABLE `catalog_product_link_attribute_decimal` DROP INDEX `IDX_CAT_PRD_LNK_ATTR_DEC_PRD_LNK_ATTR_ID`;
ALTER TABLE `catalog_product_link_attribute_int` DROP INDEX `IDX_CATALOG_PRODUCT_LINK_ATTRIBUTE_INT_PRODUCT_LINK_ATTRIBUTE_ID`;
ALTER TABLE `catalog_product_link_attribute_varchar` DROP INDEX `IDX_CAT_PRD_LNK_ATTR_VCHR_PRD_LNK_ATTR_ID`;
ALTER TABLE `catalog_product_option_price` DROP INDEX `IDX_CATALOG_PRODUCT_OPTION_PRICE_OPTION_ID`;
ALTER TABLE `catalog_product_option_title` DROP INDEX `IDX_CATALOG_PRODUCT_OPTION_TITLE_OPTION_ID`;
ALTER TABLE `catalog_product_option_type_price` DROP INDEX `IDX_CATALOG_PRODUCT_OPTION_TYPE_PRICE_OPTION_TYPE_ID`;
ALTER TABLE `catalog_product_option_type_title` DROP INDEX `IDX_CATALOG_PRODUCT_OPTION_TYPE_TITLE_OPTION_TYPE_ID`;
ALTER TABLE `catalog_product_super_attribute` DROP INDEX `IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID`;
ALTER TABLE `catalog_product_super_attribute_label` DROP INDEX `IDX_CAT_PRD_SPR_ATTR_LBL_PRD_SPR_ATTR_ID`;
ALTER TABLE `catalog_product_super_attribute_pricing` DROP INDEX `IDX_CAT_PRD_SPR_ATTR_PRICING_PRD_SPR_ATTR_ID`;
ALTER TABLE `catalog_product_super_link` DROP INDEX `IDX_CATALOG_PRODUCT_SUPER_LINK_PRODUCT_ID`;
ALTER TABLE `cataloginventory_stock_item` DROP INDEX `IDX_CATALOGINVENTORY_STOCK_ITEM_PRODUCT_ID`;
ALTER TABLE `catalogrule_customer_group` DROP INDEX `IDX_CATALOGRULE_CUSTOMER_GROUP_RULE_ID`;
ALTER TABLE `catalogrule_group_website` DROP INDEX `IDX_CATALOGRULE_GROUP_WEBSITE_RULE_ID`;
ALTER TABLE `catalogrule_product` DROP INDEX `IDX_CATALOGRULE_PRODUCT_RULE_ID`;
ALTER TABLE `catalogrule_website` DROP INDEX `IDX_CATALOGRULE_WEBSITE_RULE_ID`;
ALTER TABLE `catalogsearch_result` DROP INDEX `IDX_CATALOGSEARCH_RESULT_QUERY_ID`;
ALTER TABLE `core_translate` DROP INDEX `IDX_CORE_TRANSLATE_STORE_ID`;
ALTER TABLE `core_url_rewrite` DROP INDEX `IDX_CORE_URL_REWRITE_ID_PATH`;
ALTER TABLE `core_variable_value` DROP INDEX `IDX_CORE_VARIABLE_VALUE_VARIABLE_ID`;
ALTER TABLE `customer_address_entity_datetime` DROP INDEX `IDX_CUSTOMER_ADDRESS_ENTITY_DATETIME_ENTITY_ID`;
ALTER TABLE `customer_address_entity_decimal` DROP INDEX `IDX_CUSTOMER_ADDRESS_ENTITY_DECIMAL_ENTITY_ID`;
ALTER TABLE `customer_address_entity_int` DROP INDEX `IDX_CUSTOMER_ADDRESS_ENTITY_INT_ENTITY_ID`;
ALTER TABLE `customer_address_entity_text` DROP INDEX `IDX_CUSTOMER_ADDRESS_ENTITY_TEXT_ENTITY_ID`;
ALTER TABLE `customer_address_entity_varchar` DROP INDEX `IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID`;
ALTER TABLE `customer_entity` DROP INDEX `IDX_CUSTOMER_ENTITY_EMAIL_WEBSITE_ID`;
ALTER TABLE `customer_entity_datetime` DROP INDEX `IDX_CUSTOMER_ENTITY_DATETIME_ENTITY_ID`;
ALTER TABLE `customer_entity_decimal` DROP INDEX `IDX_CUSTOMER_ENTITY_DECIMAL_ENTITY_ID`;
ALTER TABLE `customer_entity_int` DROP INDEX `IDX_CUSTOMER_ENTITY_INT_ENTITY_ID`;
ALTER TABLE `customer_entity_text` DROP INDEX `IDX_CUSTOMER_ENTITY_TEXT_ENTITY_ID`;
ALTER TABLE `customer_entity_varchar` DROP INDEX `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID`;
ALTER TABLE `downloadable_link` DROP INDEX `IDX_DOWNLOADABLE_LINK_PRODUCT_ID`;
ALTER TABLE `downloadable_link_title` DROP INDEX `IDX_DOWNLOADABLE_LINK_TITLE_LINK_ID`;
ALTER TABLE `downloadable_sample_title` DROP INDEX `IDX_DOWNLOADABLE_SAMPLE_TITLE_SAMPLE_ID`;
ALTER TABLE `eav_attribute` DROP INDEX `IDX_EAV_ATTRIBUTE_ENTITY_TYPE_ID`;
ALTER TABLE `eav_attribute_label` DROP INDEX `IDX_EAV_ATTRIBUTE_LABEL_ATTRIBUTE_ID`;
ALTER TABLE `eav_entity_datetime` DROP INDEX `IDX_EAV_ENTITY_DATETIME_ENTITY_ID`;
ALTER TABLE `eav_entity_datetime` DROP INDEX `IDX_EAV_ENTITY_DATETIME_ENTITY_TYPE_ID`;
ALTER TABLE `eav_entity_datetime` DROP INDEX `IDX_EAV_ENTITY_DATETIME_ATTRIBUTE_ID`;
ALTER TABLE `eav_entity_decimal` DROP INDEX `IDX_EAV_ENTITY_DECIMAL_ENTITY_ID`;
ALTER TABLE `eav_entity_decimal` DROP INDEX `IDX_EAV_ENTITY_DECIMAL_ENTITY_TYPE_ID`;
ALTER TABLE `eav_entity_decimal` DROP INDEX `IDX_EAV_ENTITY_DECIMAL_ATTRIBUTE_ID`;
ALTER TABLE `eav_entity_int` DROP INDEX `IDX_EAV_ENTITY_INT_ENTITY_ID`;
ALTER TABLE `eav_entity_int` DROP INDEX `IDX_EAV_ENTITY_INT_ENTITY_TYPE_ID`;
ALTER TABLE `eav_entity_int` DROP INDEX `IDX_EAV_ENTITY_INT_ATTRIBUTE_ID`;
ALTER TABLE `eav_entity_text` DROP INDEX `IDX_EAV_ENTITY_TEXT_ENTITY_ID`;
ALTER TABLE `eav_entity_varchar` DROP INDEX `IDX_EAV_ENTITY_VARCHAR_ENTITY_ID`;
ALTER TABLE `eav_entity_varchar` DROP INDEX `IDX_EAV_ENTITY_VARCHAR_ENTITY_TYPE_ID`;
ALTER TABLE `eav_entity_varchar` DROP INDEX `IDX_EAV_ENTITY_VARCHAR_ATTRIBUTE_ID`;
ALTER TABLE `eav_form_element` DROP INDEX `IDX_EAV_FORM_ELEMENT_TYPE_ID`;
ALTER TABLE `eav_form_fieldset` DROP INDEX `IDX_EAV_FORM_FIELDSET_TYPE_ID`;
ALTER TABLE `eav_form_fieldset_label` DROP INDEX `IDX_EAV_FORM_FIELDSET_LABEL_FIELDSET_ID`;
ALTER TABLE `newsletter_queue_link` DROP INDEX `IDX_NEWSLETTER_QUEUE_LINK_QUEUE_ID`;
ALTER TABLE `sales_order_tax_item` DROP INDEX `IDX_SALES_ORDER_TAX_ITEM_TAX_ID`;
ALTER TABLE `sales_payment_transaction` DROP INDEX `IDX_SALES_PAYMENT_TRANSACTION_ORDER_ID`;
ALTER TABLE `salesrule_coupon_usage` DROP INDEX `IDX_SALESRULE_COUPON_USAGE_COUPON_ID`;
ALTER TABLE `salesrule_customer_group` DROP INDEX `IDX_SALESRULE_CUSTOMER_GROUP_RULE_ID`;
ALTER TABLE `salesrule_label` DROP INDEX `IDX_SALESRULE_LABEL_RULE_ID`;
ALTER TABLE `salesrule_website` DROP INDEX `IDX_SALESRULE_WEBSITE_RULE_ID`;
ALTER TABLE `tag_relation` DROP INDEX `IDX_TAG_RELATION_TAG_ID`;
ALTER TABLE `tag_summary` DROP INDEX `IDX_TAG_SUMMARY_TAG_ID`;
ALTER TABLE `tax_calculation` DROP INDEX `IDX_TAX_CALCULATION_TAX_CALCULATION_RATE_ID`;
ALTER TABLE `tax_calculation_rate_title` DROP INDEX `IDX_TAX_CALCULATION_RATE_TITLE_TAX_CALCULATION_RATE_ID`;
ALTER TABLE `tax_calculation_rule` DROP INDEX `IDX_TAX_CALC_RULE_PRIORITY_POSITION_TAX_CALC_RULE_ID`, ADD INDEX `IDX_TAX_CALC_RULE_PRIORITY_POSITION_TAX_CALC_RULE_ID` (`priority`,`position`);
ALTER TABLE `widget_instance_page_layout` DROP INDEX `IDX_WIDGET_INSTANCE_PAGE_LAYOUT_LAYOUT_UPDATE_ID`;
fballiano commented 1 year ago

I started https://github.com/OpenMage/magento-lts/pull/3004 but anybody is welcome to chip in and add part of the implementation :-)