BlackbitDigitalCommerce / pimcore-data-director

Import Bundle for Pimcore
16 stars 3 forks source link

MYSQL temp table problem: error: 1114 #136

Open betterapp opened 7 months ago

betterapp commented 7 months ago

Hello

There is such query

SELECT i.id, i.updated, dataport_resource.resource FROM plugin_pim_rawitem i INNER JOIN plugin_pim_dataport_resource dataport_resource ON i.dataport_resource_id = dataport_resource.id WHERE dataport_resource.dataportId = 8 ORDER BY i.dataport_resource_id, updated, i.priority LIMIT 0,25

When we run it it making such error for us:

General error: 1114 The table '/tmp/#sql_1b39_2' is full

We needed to move MYSQL data to separate partition because looks like this select create temporary table which had 24GB and growing.

Why this query run for such a long time ? We have more then 100k records in table plugin_pim_dataport_resource for 3 dataports.

for what this table is used ? plugin_pim_dataport_resource

why there is so much data ? why this table is not cleared by data director.

can we remove data from there ?

betterapp commented 7 months ago

Can we also clear those tables ? image

When I try to open Preview panel in dataport I got error 500 from database.

we have 2mln and 4mln records in those tables

betterapp commented 7 months ago

@BlackbitDevs any answers ?

BlackbitDevs commented 7 months ago

Hi @betterapp,

plugin_pim_dataport_resource is the table which contains the different import resources for a certain dataport. For example when you run a dataport with different SQL conditions or different import files, there will be a new entry in this table.

plugin_pim_rawitem contains the datasets (= rows) for the import resources. plugin_pim_rawitemData contains the actual data for those rows.

I wonder why plugin_pim_rawitem is bigger than plugin_pim_rawitemData because actually this is a 1:n relation. 1 item in plugin_pim_rawitem relates to multiple items in plugin_pim_rawitemData.

Yes, you can clear all of them. In fact this is also being done by the maintenance job: All dataport resources which have not been used for 14 days, automatically get deleted - and because of the foreign keys in plugin_pim_rawitem and plugin_pim_rawitemData the corresponding items from those tables get deleted, too. Please check if the foreign keys are in place (e.g. via SHOW CREATE TABLE plugin_pim_rawitem).

betterapp commented 7 months ago

As I see there is on foregin key. is it correct ? @BlackbitDevs

CREATE TABLE `plugin_pim_rawitem` (
    `id` BINARY(16) NOT NULL,
    `dataport_resource_id` INT(11) UNSIGNED NOT NULL,
    `updated` DATETIME NOT NULL,
    `hash` CHAR(32) NOT NULL COLLATE 'utf8mb4_general_ci',
    `priority` VARCHAR(191) NOT NULL COLLATE 'utf8mb4_general_ci',
    PRIMARY KEY (`id`) USING BTREE,
    UNIQUE INDEX `unique_rawitem_dataport_resource_id_hash` (`dataport_resource_id`, `hash`) USING BTREE,
    INDEX `idx_rawitem_dataport_resource_id_priority` (`dataport_resource_id`, `priority`) USING BTREE,
    INDEX `idx_dataport_resource_id_updated_priority` (`dataport_resource_id`, `updated`, `priority`) USING BTREE,
    CONSTRAINT `fk_rawItem_dataport_resource` FOREIGN KEY (`dataport_resource_id`) REFERENCES `plugin_pim_dataport_resource` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `plugin_pim_rawitemData` (
    `rawItemId` BINARY(16) NOT NULL,
    `fieldNo` INT(11) UNSIGNED NOT NULL,
    `value` LONGTEXT NULL DEFAULT NULL COLLATE 'utf8mb4_general_ci',
    PRIMARY KEY (`rawItemId`, `fieldNo`) USING BTREE,
    FULLTEXT INDEX `fulltext_value` (`value`),
    CONSTRAINT `fk_rawitemdata_rawitem` FOREIGN KEY (`rawItemId`) REFERENCES `plugin_pim_rawitem` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
;
BlackbitDevs commented 7 months ago

Yes, is correct.

Is this a parametrized export? This use-case we have to improve in future because currently the raw data for every SQL condition gets saved again and again. I want to implement an m:n relation here so that the raw data for a certain data object / import hash only exists once and then there is a reference table between this data table and plugin_pim_dataport_resource.

Or is this dataport static with always same import / data resource?

In either way, you can remove raw data at any time - as long as the import resources which provided this data are still available.