EmicoEcommerce / Magento2Tweakwise-archived

Magento 2 module for Tweakwise integration
Other
9 stars 25 forks source link

Is there anything that can improve the performance to generate a feed? #156

Closed leonhelmus closed 3 years ago

leonhelmus commented 3 years ago

What is the purpose of this issue? Explain the background context. For our development/production environment we try to create a tweakwise:export for our client, but it seems that the feed takes a lot of resources to be generated. When running the tweakwise export on our acceptance environment (that has same stores/categories/products) it takes 2463.98s using 5695.77Mb memory. I would like to know if we can split these exports by store or page? This would help in resource management of our server.

Environment

Steps to reproduce

  1. Login to the server
  2. Run bin/magento tweakwise:export
  3. Get this message when finished: Feed written to //var/feeds/tweakwise.xml in 2463.98s using 5695.77Mb memory

Actual result

The feed generated takes 2463.98s using 5695.77Mb memory to create. The tweakwise.xml is 1.6G big. How can the performance of generating a feed be improved? ...

Expected result Create multiple feeds per store or add a max page per feed in which all feeds will be smaller.

gsomoza commented 3 years ago

Having only spent a few minutes looking at the code I see this:

  1. vendor/emico/tweakwise-export/src/Model/Write/Products.php:131 => XML will be flushed (written to file) every 100 iterations
  2. vendor/emico/tweakwise-export/src/Model/Write/EavIterator.php:288 => iterator batch size is 5000
  3. ~When combining these, it means the XML will only be flushed every 100 * 5 000 = 500 000 products.~

~In other words, it seems that it will never write the XML to file until ALL products have been added to the XML in memory? So that's 29 595 products * 1 206 attributes = 35 691 570 XML entries in memory?~

Even if this first impression is correct (since it's very possible my quick assessment may be completely wrong), another factor to consider is whether flushing the XML will actually free up any memory, since the XML object itself will still be available in memory after flushing and will probably still have all items in it?

UPDATE: on second review, the generators are being nested so each iteration is a product and not a batch, so the multiplication above is not correct. The question of whether whether flushing the XML writer will actually free up memory or not might still be something to consider though.

gsomoza commented 3 years ago

Ok so what's happening is 5000 items are loaded fully in memory (6+ million items in memory) and then 100 of them are flushed to disk at a time. I think this alone might still be enough to explain the memory consumption we're seeing. We'll test reducing batch size from 5000 to 500 as a PoC of what happens to the memory vs speed tradeoff in that case.

gsomoza commented 3 years ago

Changing the setting to 500 items reduced memory consumption to 1.2GB which is more manageable, so this seems like the right way forward. This ticket can be closed.