akeneo / pim-community-dev

[Community Development Repository] The open source Product Information Management (PIM)
http://www.akeneo.com
Other
951 stars 516 forks source link

Importing the same SKU over multiple steps isn't working #5309

Closed hostep closed 7 years ago

hostep commented 7 years ago

I think I'm reporting a Bug

Hi guys

We have a custom importer module with multiple steps. In a couple of steps we import some product data, but it happens that we import the same sku over multiple steps. See below for an example, in step 3 (import_products) and also in step 4 (import_product_descriptions) we can import the same sku but with different attributes.

When we do this, we see these warnings appearing during step 4:

values[sku]: The value AABBCC is already set on another product for the unique attribute sku: AABBCC

I understand that we can't import the same product in the same step. But I would have expected we can import them over different steps. Is this supposed to happen or is this a bug?

Using Akeneo CE 1.5.14, for more details: akeneo-pim-system-info_2016-12-06_12-51.txt

batch_jobs.yml file (*** was added to hide confidential information):

connector:
    name: *** Import Connector
    jobs:
        ***_import_from_***:
            title: ***_import_connector.jobs.***_import_from_***.title
            type:  import
            steps:
                download_import_files:
                    title: ***_import_connector.jobs.***_import_from_***.download_import_files.title
                    class: "%***_import_connector.downloader.products.class%"
                    services:
                        xml_handler: ***_import_connector.downloader.handler.xml
                        images_handler: ***_import_connector.downloader.handler.images
                import_attribute_options:
                    title: ***_import_connector.jobs.***_import_from_***.import_attribute_options.title
                    services:
                        reader:    ***_import_connector.reader.file.xml_options
                        processor: pim_connector.processor.denormalization.attribute_option.flat
                        writer:    pim_connector.writer.doctrine.attribute_option
                import_products:
                    title: ***_import_connector.jobs.***_import_from_***.import_products.title
                    services:
                        reader:    ***_import_connector.reader.file.xml_products
                        processor: pim_connector.processor.denormalization.product.flat
                        writer:    pim_connector.writer.doctrine.product
                import_product_descriptions:
                    title: ***_import_connector.jobs.***_import_from_***.import_product_descriptions.title
                    services:
                        reader:    ***_import_connector.reader.file.xml_product_descriptions
                        processor: pim_connector.processor.denormalization.product.flat
                        writer:    pim_connector.writer.doctrine.product

Thanks!

hostep commented 7 years ago

Would be great if I could get some feedback regarding if this is expected behavior or a bug, so I can communicate this to our client and so I can decide if we need to try to work around this or if we can depend on you delivering a bugfix for this issue.

Thanks! :)

fabienlem commented 7 years ago

Hello @hostep,

I'm sorry for my late reply. It should be possible to import the same SKU over multiple steps.

I think the error occurs because you have SKU duplicates in your database. Can you check if it's the case? If yes, can you clean your data and try again to perform your import?

We have been informed that SKU duplication issue can happen when you are using MySQL, it is linked to doctrine's object detacher. We are currently thinking about the best way to prevent this.

Regards, Fabien

fabienlem commented 7 years ago

@hostep,

In addition, we think your issue is linked to the unique value validator (https://github.com/akeneo/pim-community-dev/blob/master/src/Pim/Component/Catalog/Validator/Constraints/UniqueValueValidator.php#L38). It stores in an internal cache all unique values which were already checked to be sure to detect twins in the same bulk of products. In your fourth step, as this stateful validator has not been reset, it considers that it has already validated the same SKU as if it has encounter a twin line (even if it was in a previous step).

So, you can try to clear the "uniqueValuesSet" of the UniqueValueValidator at the end of the the third step to see if it fixes the issue.

Regards, Fabien

hostep commented 7 years ago

Ok thanks for the info Fabien!

I currently can't reproduce the above bug any longer and by running the following query, I don't see any doubles;

SELECT COUNT(*) AS number, value_string FROM pim_catalog_product_value WHERE attribute_id = 1 GROUP BY value_string ORDER BY number DESC;

So if I encounter the problem again, I'll try this query again and that will maybe show me a duplicated sku.

I'll also try looking into the UniqueValueValidator you mentioned when I find some time.

Thanks again for the info!