Nosto / nosto-magento2

https://marketplace.magento.com/nosto-module-nostotagging.html
Open Software License 3.0
27 stars 26 forks source link

nosto_index_product_queue_processor is running out of memory when reindex #790

Closed paco-shum-fisheye closed 1 year ago

paco-shum-fisheye commented 2 years ago

Magento 2 version(s) used: 2.4.3-P1 Extension version(s) affected: issue found on 5.2.10, tested 6.0.6

Description
When running Magento reindex, nosto_index_product_queue_processor is running out of memory.

bin/mage indexer:reindex nosto_index_product_queue_processor
Nosto Product Queue Processor index PHP Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 20480 bytes) in /var/www/html/vendor/magento/framework/Serialize/Serializer/Json.php on line 37

Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 20480 bytes) in /var/www/html/vendor/magento/framework/Serialize/Serializer/Json.php on line 37

Check https://getcomposer.org/doc/articles/troubleshooting.md#memory-limit-errors for more info on how to handle out of memory errors.%

How to reproduce
Having a large catalog and run reindex.

Magento 2 mode

Full page cache

Nosto indexer mode

Possible Solution

Additional context
Within the admin, the Indexer Memory was set to 10%.

==> var/log/system.log <==
[2022-08-17 17:40:18] nosto.INFO: Begin a full reindex {"indexerId":"nosto_index_product_queue_processor"} []

==> var/log/debug.log <==
[2022-08-17 17:40:18] nosto.INFO: Begin a full reindex {"indexerId":"nosto_index_product_queue_processor"} []
[2022-08-17 17:40:18] nosto.DEBUG: Indexing by mode "none" {"indexerId":"nosto_index_product_queue_processor","sourceClass":"Nosto\\Tagging\\Model\\Indexer\\QueueProcessorIndexer\\Interceptor"} []
[2022-08-17 17:40:18] nosto.DEBUG: [START] Processing dimension: "default" {"indexerId":"nosto_index_product_queue_processor","storeId":"1","sourceClass":"Nosto\\Tagging\\Model\\Indexer\\QueueProcessorIndexer\\Interceptor"} []
[2022-08-17 17:40:18] nosto.DEBUG: Started processing 73179 of queue entries {"storeId":"1","sourceClass":"Nosto\\Tagging\\Model\\Service\\Update\\QueueProcessorService"} []
bin/magento indexer:status
+-------------------------------------+---------------------------------+------------+-----------+--------------------------+---------------------+
| ID                                  | Title                           | Status     | Update On | Schedule Status          | Schedule Updated    |
+-------------------------------------+---------------------------------+------------+-----------+--------------------------+---------------------+
| nosto_index_product_queue           | Nosto Product Queue             | Ready      | Schedule  | idle (0 in backlog)      | 2022-08-17 18:08:07 |
| nosto_index_product_queue_processor | Nosto Product Queue Processor   | Processing | Schedule  | suspended (0 in backlog) | 2022-08-17 17:40:18 |
+-------------------------------------+---------------------------------+------------+-----------+--------------------------+---------------------+
supercid commented 2 years ago

Hi, it looks like you're total instance memory for PHP is 2GB, which is too small to run Magento. In case you have more memory available, make sure is configured to

paco-shum-fisheye commented 2 years ago

Hi, it looks like you're total instance memory for PHP is 2GB, which is too small to run Magento. In case you have more memory available, make sure is configured to

Thank you @supercid for responding so quickly. a PHP memory limit of 2GB should be more than enough to run a Magento instance, the Magento DevDocs themselves even recommend a 2GB memory limit.

After some further debugging we have been able to pinpoint the memory leak to the \Nosto\Tagging\Model\ResourceModel\Product\Update\Queue\QueueCollection::_afterLoad method, looping over every item to unserialize the fields eventually causes the memory to build up and run out, during our tests if we comment out line 192 the index completes as we would expect.

Thanks

supercid commented 2 years ago

That's a great finding! Thanks for investigating on your own, we will have a look on our side and make a fix if confirmed! BR,

fredden commented 2 years ago

@supercid please can you leave this issue open until the problem is resolved. When the problem has been resolved, please can you update this issue with details of what version we should upgrade to in order to get the fix.

fredden commented 2 years ago

@supercid please can you reopen this issue.

nasabs commented 2 years ago

@supercid have you fixed this yet? I am the end customer this issue is affecting and we pay you a lot of money. Your index is breaking a core magento function and you're not fixing it. This is unacceptable.

supercid commented 2 years ago

Hi @nasabs, Apologies for the delay. I'm personally looking into the issue but I haven't found a clean way to resolve it as yet. As soon as a fix in place, we'll follow up here. BR,

supercid commented 2 years ago

Hi all, We have been trying to optimize the memory usage of the indexer and we have a couple of points why it currently cannot be reduced further. We got rid of factories usages on the queue and the queue processor indexer, this has improved memory usage by an average of 10.29%. That could have been improved even further 3% by using distinct on SQL to fetch even less data, but I got afraid of putting too much load on DB's. Furthermore, the DB took even longer than using the factories as before.

Regarding the serialisation, that's a necessary operation to build a queue entry with product ID's that further will be indexed by the message consumers, we indeed need to loop over ever batched entry, if you disable/comment out, you'll crash the execution and will have no products being sent to the message queue.

@paco-shum-fisheye

the admin, the Indexer Memory was set to 10%.

I misread this the first time, what you should do indeed is to increase the amount allocated, This will only allocate ~205mb of your total 2GB allocated for PHP. If you increase the amount allocated and the process die, you'll have to allocate more RAM for your PHP interpreter.

Thank you @supercid for responding so quickly. a PHP memory limit of 2GB should be more than enough to run a Magento instance, the Magento DevDocs themselves even recommend a 2GB memory limit.

Due to the amount of calculation, data fetching and so much overhead from Magento itself that we need to deal with before sending your product data to Nosto, 2GB is likely not enough. I haven't calculated this correctly yet, but when I do, I'll update the extension minimum requirements

Here are some execution time stats of the benchmarks:

Command Mean [s] Min [s] Max [s] Relative
bin/magento indexer:reset nosto_index_product_queue;bin/magento indexer:reindex nosto_index_product_queue; bin/magento indexer:reset nosto_index_product_queue_processor; bin/magento indexer:reindex nosto_index_product_queue_processor 70.738 ± 4.843 62.112 73.256 1.00
root@magento245:/var/www/html/magento2# cat no-factories-no-distinct.md Command Mean [s] Min [s] Max [s] Relative
bin/magento indexer:reset nosto_index_product_queue;bin/magento indexer:reindex nosto_index_product_queue; bin/magento indexer:reset nosto_index_product_queue_processor; bin/magento indexer:reindex nosto_index_product_queue_processor 86.743 ± 0.235 86.473 86.969 1.00
root@magento245:/var/www/html/magento2# cat no-factories-WITH-distinct.md Command Mean [s] Min [s] Max [s] Relative
bin/magento indexer:reset nosto_index_product_queue;bin/magento indexer:reindex nosto_index_product_queue; bin/magento indexer:reset nosto_index_product_queue_processor; bin/magento indexer:reindex nosto_index_product_queue_processor 85.582 ± 1.099 84.991 87.529 1.00

root@magento245:/var/www/html/magento2# cat factories-master.md

fredden commented 2 years ago

@supercid thanks for the update. Please can you keep this open until the problem is resolved. (You mention that there are further works on your list for this, like testing and correcting your documentation regarding minimum memory allocation.)

Regarding the 10% setting, one can see in the error message from PHP that the memory_limit is indeed 2 GiB not 10% of this.

I recommend using an array walk iterator (example), or moving the deserialisation closer to where it's actually required. I think the main problem is that the collection is too large and should be limited / paged. I don't know if you unpack the data within the collection before or after paging has been applied. Moving the serialisation closer to where it's necessary means that the memory leak / usage this requires can be limited to the item currently being looped through (with a local variable), rather than requiring all the memory all the time.

fredden commented 2 years ago

@supercid please can you reopen this.