baldwin-agency / magento2-module-url-data-integrity-checker

Magento 2 module which can find potential url related problems in your catalog data
MIT License
262 stars 28 forks source link

Killed Integrity Urlkey process in console #13

Open gifrancohe opened 4 years ago

gifrancohe commented 4 years ago

After the process is finished and the percentage is 100%, the console printed "Killed" and the backend shows the message "We are already refreshing the product url key's, just have a little patience". But nothing happens and when I try to run the process again from the console, an exception is displayed with the same message. "An unexpected exception occured: 'We are already refreshing the product url key's, just have a little patience"

I hope they can help me. Thank you very much.

hostep commented 4 years ago

Hmm, that's strange. Does the 'Killed' message come from Magento, or from your operating system? Maybe you are running against the limits of your system and the process gets killed by the operating system?

You can always specify the --force flag to restart the process in case the previous run got stuck. See the help section:

$ bin/magento catalog:product:integrity:urlkey --help
Description:
  Checks data integrity of the values of the url_key product attribute.

Usage:
  catalog:product:integrity:urlkey [options]

Options:
  -f, --force           Force the command to run, even if it is already marked as already running
  -h, --help            Display this help message
  -q, --quiet           Do not output any message
  -V, --version         Display this application version
      --ansi            Force ANSI output
      --no-ansi         Disable ANSI output
  -n, --no-interaction  Do not ask any interactive question
  -v|vv|vvv, --verbose  Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
gifrancohe commented 3 years ago

apparently it's my operating system, but I don't know why it's killing the request. Try executing the command with the memory_limit parameter and the -f flag but the same thing happens. My store has around 390,000 products, I think that's because of that volume of url to process.

The other commands work perfect. I leave an example of how I am executing the command

$ php -d memory_limit=6G bin/magento catalog:product:integrity:urlkey -f

Thanks for the help.

hostep commented 3 years ago

Thanks for the feedback!

I've only tested this module with a collection of max ~40.000 products if I remember correctly, so your 390.000 products might indeed need a lot of memory and might trigger the Out Of Memory Killer on your OS.

I'll take a stab at trying out this approach later this week, maybe it will help in reducing the amount of memory needed: https://www.matheusgontijo.com/2018/02/10/magento-2-working-with-large-collections-php-fatal-error-allowed-memory-size-of-xxxx-bytes-exhausted/

hostep commented 3 years ago

Hi @gifrancohe!

I've made a first attempt at reducing the memory usage for generating product url key problems on the memory-optimisations branch.

Could you maybe test this out? You can run the following composer command to get that experimental branch:

composer require baldwin/magento2-module-url-data-integrity-checker:dev-memory-optimisations

From what I've seen, this:

Could you maybe test this out a bit and give me some feedback?

Thanks!

gifrancohe commented 3 years ago

Hi @hostep.

I apologize for not answering before. I tried the test with the new branch, but the result was the same. Also try increasing the resources of my test server, going from having 2 CPU and 4 of ram memory to having 4 CPU and 16 of ram memory, but the same thing continues, after reaching 100% the progress bar, the console processes for a few minutes, until the process is killed.

hostep commented 3 years ago

Okay, that's very useful information!

So this means the gathering of information works without memory problems right now. It's writing out that data to the storage (which is currently one big json file) that is problematic now. So that's the next thing which needs to be optimised. I also heard from a colleague of mine that loading in the json file and displaying that data in a grid in the backend of Magento can crash if the data is too big. So that's also something we need to take a look at. I'm thinking of storing the data in the database instead of a big json file, that can probably be done more efficiently in terms of memory usage.

It might take me a while (a couple of weeks) to rewrite this part of the tool though. But it sounds like we need to take care of this for shops with a lot of products which have a lot of url problems.

Thanks for the feedback!

DominicWatts commented 3 years ago

@gifrancohe

Just read your issue regarding large collection and memory usage.

I've not got to play with such a large collection just yet. Max so far is around 80K. As a proof of concept I produced a simple product CSV exporter after having issues with large collections myself with a couple of different feed generator extensions on various magento versions. Using the iterator massively slowed down the process I was working on. However it did get the job done.

My proof of concept extension is tagged. 1.0.1 uses standard collection. 1.0.2 uses iterator.

However both dump to a CSV file and have potential to use lots of memory. I also print out some stats.

I'm curious how the extension will fare with such a large collection

Extension: https://github.com/DominicWatts/ProductCsvExport/

https://github.com/DominicWatts/ProductCsvExport/blob/1.0.1/Console/Command/Product.php vs https://github.com/DominicWatts/ProductCsvExport/blob/1.0.2/Console/Command/Product.php

I'd love to hear how it performs on such a large collection.