baldwin-agency / magento2-module-image-cleanup

Magento 2 module which can cleanup old image files that are no longer being used
MIT License
49 stars 4 forks source link

Request: Duplicate images #6

Open jorgb90 opened 4 months ago

jorgb90 commented 4 months ago

Great addition to this already great plugin would be an option to remove duplicates. Duplicate images occur during importing of products in bulk for example.

hostep commented 4 months ago

Hi @jorgb90: can you give me some more info about this particular situation?

The tool will already remove any file which isn't referenced in the database, so even if duplicated images exist on filesystem, they should get removed if they aren't referenced in the database.

So I'm not quite sure what you mean exactly? Some concrete example could help here.

Thanks!

jorgb90 commented 4 months ago

@hostep I have a Magento setup which has all images per product three times. I first thought this was happening because of updating the products through imports, but upon further investigation its happening because its uploading them for global and 2 storeviews.. :/ I guess we just need to delete the duplicates in the storeviews..

My initial thought was they are uploaded and are available in the database and filesystem, so currently won't be detected by this extension, but the hash of those images should be the same since its the same image. This way that can also be cleared up and prevent duplicate images.

hostep commented 4 months ago

Aha okay, it makes sense now.

Do the database entries use the exact same filename for global and storeview values? Or are the files also duplicated on disk with a different filename?

If they are the same filename, I think by using the EAV Cleaner module with its eav:attributes:restore-use-default-value command might be able to get rid of those unneeded values in the database.

If they aren't the same filename, then I guess we could add some checking in this module for this by:

This is probably a very expensive operation and might take a lot of minutes if not hours to run if you have tons of products. So I'm not sure yet if we will have time to implement this and if it's worth it at the moment.