BlackbitDigitalCommerce / pimcore-data-director

Import Bundle for Pimcore
16 stars 3 forks source link

Import create two images #8

Closed betterapp closed 2 years ago

betterapp commented 2 years ago

I have DataPort which imports some objects. Object have one field for image (Image type). In import file I have a column ImageUrl with URL to image.

I have setup mapping that map Object's field Image with ImageUrl field from import file and setup in settings this: image

When I run import image are saved into configured asset folder but as I see it is create two the same images with different names: One ends with sufix: -1 image

Is it a bug ?

BlackbitDevs commented 2 years ago

Have you checked to Overwrite imagescheckbox?

Bildschirmfoto 2022-02-09 um 09 47 47

If yes, then the behaviour is a bug. If not, then it is correct that 2 assets with suffix get created. The reason is that for remote files (like in this case a URL) we do not compare file hashes as this would take too much time for recurring imports (you would always have to fetch the image from the URL to calculate file hash - only to find out that exactly this image is already assigned to the field for the current object). For local files file hashes get compared and if the source file is the same as the Pimcore asset file, no new asset will be created. In the past the data director contained a check for the Last-Modified and Content-Length headers, have to rethink again why we dropped this...

betterapp commented 2 years ago

For me it does not make sense. If file is new and is downloaded first time there should be one file - not two. I could understend if we run the import again and the file was changed but first fetch should not create two files.

BlackbitDevs commented 2 years ago

If file is new and is downloaded first time there should be one file - not two.

Yes if you only imported one raw data item, then this is true - independent of the checkbox Overwrite images. Did you only import one raw data item? When you have multiple raw data items which refer to the same image URL, then the behaviour is correct because it is currently not cached. Every raw data item is processed in isolation - it does not know that another raw data item fetched the same image. It only sees that there already is an asset existing at the target path. For this reason it adds the _1 suffix.

If it is really a problem and you cannot use the Overwrite images checkbox, please write an email to info@blackbit.de - we will try to implement above mentioned Last-modified and / or Content-Length check as soon as possible.

BlackbitDevs commented 2 years ago

In version 2.8 I implemented the check for remote assets again based on Last-Modified header. When importing a HTTP(S) URL and the asset already exists at the target path, the Last-Modified gets compared to the modification timestamp of the Pimcore asset. If the Pimcore asset is newer or same, the already existing asset will get used.

If you want you can try with version 2.8.x-dev.

We removed this because we had problems with stream wrapper URLs like s3:// or gs:// - of course from those we cannot fetch the HTTP headers. Now I implemented the header check for remote assets only for HTTP(S) resources.

betterapp commented 2 years ago

My test import file have only one record with two columns: name and URL with remote link to image. I created class Test with two fields: name (input type) and image (image type). After import I see one new object which is correct and two the same assets with different file name as I wroted earier.

I use: 2.7.8 DD version.

BlackbitDevs commented 2 years ago

And when you remove those 2 image assets and execute the import again you end up with 2 asset files? This would really be a bug - but this does not happen on my system. Perhaps you can export your dataport configuration so that I can take a look at it?

betterapp commented 2 years ago

Exactly. I remove those 2 files and object file. run import again and i see 1 object + 2 assets. Here are the files: Just remove .txt sufix extension image.xlsx.txt image.json.txt

BlackbitDevs commented 2 years ago

You have not defined a key field in attribute mapping. Because of this all fields are used as key fields, see log

[WARNING] No key field(s) specified in attribute mapping. Falling back to using all mapped fields as key fields. Please go to attribute mapping and set one or more fields as key fields which can be used to find already existing elements.

When it finds multiple objects which match the key fields, all of them get updated one after another. This would explain why the image gets created multiple times when multiple matching objects exist. Can you send the import log (the one which gets opened when you click "Successfully finished" in the history panel).

I have tested with the latest 2.7 (2.7.10) and there I cannot reproduce the problem.

betterapp commented 2 years ago

Hmm. But there is only one object - right ?

BlackbitDevs commented 2 years ago

I don't know ;-) But I will if you provide the import log.