VirtoCommerce / vc-platform

Virto Commerce B2B Innovation Platform
https://virtocommerce.com
Other
1.23k stars 845 forks source link

Product import is unstable (CSV import performance problems) #583

Closed amironoff closed 8 years ago

amironoff commented 8 years ago

Hello,

I'm trying to import just 15k products initially using the CSV import facility. The import has been very unstable so far. Sometimes it's 300 products, sometimes - 900. The largest I could pull is 3k. This runs on decent Azure infrastructure. The worst is that VC doesn't provide any feedback on failure. It either silently stops or shows a red 'Error 500' notification on the import blade. And that's it.

How do I troubleshoot? What's the VC offering in terms of monitoring?

tatarincev commented 8 years ago

You can see detail of the error in your browser, just switch it to debug mode F12 and see the details for this request error there. And could you send to me your csv file for local reproduce this error?

amironoff commented 8 years ago

@tatarincev I think i nailed it.

  1. found the NLog section in web.config and enabled filesystem logging. It's set to 'Debug' by default, consider changing that to file system in Release builds, that'll be more useful in production;
  2. The log entry at the time of import failure says: '2016-08-05 10:16:20.3237 Hangfire.Server.BackgroundProcessingServer Processing server takes too long to shutdown. Performing ungraceful shutdown.
  3. Initial investigation shows that Hangfire is busy doing the Lucene indexing job.
  4. The error message comes from Hangfire BackgroundProcessor Dispose method. - https://github.com/HangfireIO/Hangfire/blob/master/src/Hangfire.Core/Server/BackgroundProcessingServer.cs

I hypothesize that this is Lucene indexing taking too long + Hangfire timeout being too low. I spinned up an external Elasticsearch instance. Import seems to be going forward now (although pretty slow)

amironoff commented 8 years ago

I'd strongly suggest you'd consider improving transparency of the system. There should be a searchable job log, where for each async job its state and (if the job crashed) full error stack trace is available. Right now, there's no way to easily know if e.g. import or index rebuilding succeeded.

amironoff commented 8 years ago

@tatarincev here's the (anonymized) csv file I'm uploading.

catalogue.zip

tatarincev commented 8 years ago

Because we are use Hangfire as primary library for all background jobs you may use it UI for diagnostic purposes. Just open this url http://localhost/admin/hangfire

amironoff commented 8 years ago

@tatarincev thanks, I discovered it. However, Hangfire UI says that everything is cool, no issues whatsoever, all jobs are successful :) Also, it doesn't even display 24h job log. There are just a few hours worth of data.

tatarincev commented 8 years ago

Just imported all 15k products locally but have problem with search Indexation job. Hangfire run multiple instance same job and my processor began to smoke :)

tatarincev commented 8 years ago

Need to work on csv import performance because Entity Framework does not friendly with bulk update and insertions :)

tatarincev commented 8 years ago

@artem-dudarev Need to disable concurrent execution same Hangfire jobs such as CatalogIndexJob and check it performance.

amironoff commented 8 years ago

The logs also contain lots (I estimate thousands) of this exception type cases:

Parameter name: s at System.IO.StringReader..ctor(String s) at System.Xml.Linq.XElement.Parse(String text, LoadOptions options) at VirtoCommerce.Platform.Data.Serialization.XmlExpressionSerializer.DeserializeExpression[T](String serializedExpression) in D:\Projects\Training\cms\vc-community\VirtoCommerce.Platform.Data.Serialization\XmlExpressionSerializer.cs:line 20 at VirtoCommerce.PricingModule.Data.Services.PricingServiceImpl.b__6_0() 2016-08-07 07:39:06.3824 default System.ArgumentNullException: Value cannot be null.

Exceptions are rather costly in terms of execution time. Could that be slowing things down too?

amironoff commented 8 years ago

Update: Premature import failure happens when using SQL Azure databases. I tried scaling them up to 50 DTUs with same result. When I deployed a SQL Server instance to a dedicated DS2_V2 VM, importing 15k items succeeded in around 24 hours.

tatarincev commented 8 years ago

Now I'm working on import performance. And today will be fix allows to import your 15k products near 1-2 minutes.

tatarincev commented 8 years ago

There was a small additional refactoring. And release this fix postponed to tomorrow.

amironoff commented 8 years ago

@tatarincev VirtoCommerce might be a (relatively) new kid on the block, but your speed of reacting to bug reports is amazing. Great job!

tatarincev commented 8 years ago

I think you'll be surprised at most of the new csv import speed :) But still some problems with search indexation big portion of data ;(