BlackbitDigitalCommerce / pimcore-data-director

Import Bundle for Pimcore
16 stars 3 forks source link

Please send daily report with missing selectbox/measurements options instead of separate email for each item #34

Open kaurov opened 2 years ago

kaurov commented 2 years ago

Currently, separate email is sent for each case when no existing option found. vendor/blackbit/data-director/lib/Pim/EmailReportingLogger.php vendor/blackbit/data-director/Resources/views/email-report.html.twig

When you have 10 000 products to import is means 10k+ emails.

When we have this bug in DataDirector https://github.com/BlackbitDigitalCommerce/pimcore-data-director/issues/29 it means millions of emails.

Can you please group cases in one table and send one report per day?

Thanks.

BlackbitDevs commented 2 years ago

No, the EmailReportingLogger only sends the mail once for a dataport run. So when you import 10 000 products in one import, you will get all the errors which happened during this import run in 1 email - and they even get grouped (you will see This or similar errors happened 4 times. Please see process log (link) of this run to get further details. in the log.

Or do you import those 10 000 products in single import processes? If yes, why?

kaurov commented 2 years ago

Or do you import those 10 000 products in single import processes? If yes, why?

yes, exactly.

As we discussed in https://github.com/BlackbitDigitalCommerce/pimcore-data-director/issues/30, Data-Director does not balance memory (sad) and it causess following memory usage:

the data-director:extract for 10000 xml files consumes 94% CPU and 5.1+ GB memory 1+ hour. The data-director:process for those 10000 xml files consumes 15+ GB memory 8+ hours.

So server provider just drops such import processes as dangerous.

You recommended

Alternatively you can directly start processing this one raw data item

I did that in foreach, the result is super: 15% CPU, 0.6 Gb memory usage for the same 〜8 hours and 10k xml files. In ssh-multythread mode 36% CPU, each process up to 0.6 Gb memory (usually till 0.3) and 2 hours for the same 10k xml files.

The only problem is email notification. Any new data structure and I will know about that 10k times.

If you will fix memory usage in data-director:complete then it can be ok to have email once an import. But will it really be only one import? if it is sceduled by

BlackbitDevs commented 2 years ago

But will it really be only one import?

Yes because when you use a folder (or glob expression) as import resource than in the rawdata extraction step the data from all matching files gets extracted (sorted ascending by file modification date in case there is multiple raw data which will change the same object - so we have the latest changes imported last). And only thereafter the raw data processing step gets executed which updates the Pimcore data objects.

Do you have a chance to try version 3.1.x-dev? In this I have fixed a lot of memory issues - especially when the data object versions get created, Pimcore clones the whole object with Deepcopy. I analysed this with XDebug profiler which said it needed about 40% of runtime (and of course also a lot of memory).

Nevertheless I will try to find a way to group error mails...

BlackbitDevs commented 2 years ago

Since version 3.1 the error log mail for a certain dataport does not get set if another such email got sent within the last 5 minutes. This will prevent you from receiving loads of emails when there are a lot of separate imports. This can not only happen when you run the single object imports manually as described here but even more often when you have a failing import which gets automatically restarted.

Nevertheless the "daily error mail of all dataports" would still be a nice feature...