episerver / content-ab-testing

Apache License 2.0
1 stars 1 forks source link

Issue with A/B-test causing massive amounts SQL-Queries #44

Open BaloghDaniel opened 6 months ago

BaloghDaniel commented 6 months ago

We have a problem on our site that i think stems from the AB-testing addon.

3 Mars We noticed that the IIS-server was not responding and when we checked our logs before the crash we noticed 351 000+ SQL-commands had been executed in under 5 minutes. This kept going for about 40 minutes. We then recycled the app pool and the site worked fine again.

28 Feb

IIS was not responding and i saw the same messages saying that the same or very similar SQLcommand where being executed two times every millisecond which probably ended up crashing / freezing the entire server.

So we got the same problem 4 days apart.

During this time we had an ongoing A/B-test live on the site. The test has been taken down to try to isolate the problem.

These commands seems to be executed about 10 times per minute all day (i persume because the a/b test evaluates what content to show visitors), but out of nowhere it spikes and goes crazy. These are the typical SQL commands we can see being executed extremly rapidly:

SELECT [t].[Id], [t].[ActualVisitorCount], [t].[ConfidenceLevel], [t].[ContentLanguage], [t].[CreatedDate], [t].[Description], [t].[EndDate], [t].[ExpectedVisitorCount], [t].[IsSignificant], [t].[LastModifiedBy], [t].[ModifiedDate], [t].[OriginalItemId], [t].[Owner], [t].[ParticipationPercentage], [t].[StartDate], [t].[State], [t].[Title], [t].[ZScore], [t3].[Id], [t3].[Conversions], [t3].[CreatedDate], [t3].[IsPublished], [t3].[IsWinner], [t3].[ItemId], [t3].[ItemVersion], [t3].[ModifiedDate], [t3].[TestId], [t3].[Views], [t3].[Id0], [t3].[ConvertedTotal], [t3].[ConvertedTotalCulture], [t3].[CreatedDate0], [t3].[KpiId], [t3].[ModifiedDate0], [t3].[Total], [t3].[TotalMarketCulture], [t3].[VariantId], [t3].[Id1], [t3].[CreatedDate1], [t3].[KpiId0], [t3].[ModifiedDate1], [t3].[Value], [t3].[VariantId0], [t3].[Id2], [t3].[Conversions0], [t3].[CreatedDate2], [t3].[KpiId1], [t3].[ModifiedDate2], [t3].[Performance], [t3].[SelectedWeight], [t3].[VariantId1], [t3].[Weight], [t5].[Id], [t5].[CreatedDate], [t5].[KeyPerformanceIndicatorId], [t5].[ModifiedDate], [t5].[TestId] FROM [tblABTest] AS [t] LEFT JOIN ( SELECT [t0].[Id], [t0].[Conversions], [t0].[CreatedDate], [t0].[IsPublished], [t0].[IsWinner], [t0].[ItemId], [t0].[ItemVersion], [t0].[ModifiedDate], [t0].[TestId], [t0].[Views], [t1].[Id] AS [Id0], [t1].[ConvertedTotal], [t1].[ConvertedTotalCulture], [t1].[CreatedDate] AS [CreatedDate0], [t1].[KpiId], [t1].[ModifiedDate] AS [ModifiedDate0], [t1].[Total], [t1].[TotalMarketCulture], [t1].[VariantId], [t2].[Id] AS [Id1], [t2].[CreatedDate] AS [CreatedDate1], [t2].[KpiId] AS [KpiId0], [t2].[ModifiedDate] AS [ModifiedDate1], [t2].[Value], [t2].[VariantId] AS [VariantId0], [t4].[Id] AS [Id2], [t4].[Conversions] AS [Conversions0], [t4].[CreatedDate] AS [CreatedDate2], [t4].[KpiId] AS [KpiId1], [t4].[ModifiedDate] AS [ModifiedDate2], [t4].[Performance], [t4].[SelectedWeight], [t4].[VariantId] AS [VariantId1], [t4].[Weight] FROM [tblABVariant] AS [t0] LEFT JOIN [tblABKeyFinancialResult] AS [t1] ON [t0].[Id] = [t1].[VariantId] LEFT JOIN [tblABKeyValueResult] AS [t2] ON [t0].[Id] = [t2].[VariantId] LEFT JOIN [tblABKeyConversionResult] AS [t4] ON [t0].[Id] = [t4].[VariantId] ) AS [t3] ON [t].[Id] = [t3].[TestId] LEFT JOIN [tblABKeyPerformanceIndicator] AS [t5] ON [t].[Id] = [t5].[TestId] WHERE [t].[State] = 1 ORDER BY [t].[Id], [t3].[Id], [t3].[Id0], [t3].[Id1], [t3].[Id2]

and

SELECT [t].[Id], [t].[ActualVisitorCount], [t].[ConfidenceLevel], [t].[ContentLanguage], [t].[CreatedDate], [t].[Description], [t].[EndDate], [t].[ExpectedVisitorCount], [t].[IsSignificant], [t].[LastModifiedBy], [t].[ModifiedDate], [t].[OriginalItemId], [t].[Owner], [t].[ParticipationPercentage], [t].[StartDate], [t].[State], [t].[Title], [t].[ZScore], [t3].[Id], [t3].[Conversions], [t3].[CreatedDate], [t3].[IsPublished], [t3].[IsWinner], [t3].[ItemId], [t3].[ItemVersion], [t3].[ModifiedDate], [t3].[TestId], [t3].[Views], [t3].[Id0], [t3].[ConvertedTotal], [t3].[ConvertedTotalCulture], [t3].[CreatedDate0], [t3].[KpiId], [t3].[ModifiedDate0], [t3].[Total], [t3].[TotalMarketCulture], [t3].[VariantId], [t3].[Id1], [t3].[CreatedDate1], [t3].[KpiId0], [t3].[ModifiedDate1], [t3].[Value], [t3].[VariantId0], [t3].[Id2], [t3].[Conversions0], [t3].[CreatedDate2], [t3].[KpiId1], [t3].[ModifiedDate2], [t3].[Performance], [t3].[SelectedWeight], [t3].[VariantId1], [t3].[Weight], [t5].[Id], [t5].[CreatedDate], [t5].[KeyPerformanceIndicatorId], [t5].[ModifiedDate], [t5].[TestId] FROM [tblABTest] AS [t] LEFT JOIN ( SELECT [t0].[Id], [t0].[Conversions], [t0].[CreatedDate], [t0].[IsPublished], [t0].[IsWinner], [t0].[ItemId], [t0].[ItemVersion], [t0].[ModifiedDate], [t0].[TestId], [t0].[Views], [t1].[Id] AS [Id0], [t1].[ConvertedTotal], [t1].[ConvertedTotalCulture], [t1].[CreatedDate] AS [CreatedDate0], [t1].[KpiId], [t1].[ModifiedDate] AS [ModifiedDate0], [t1].[Total], [t1].[TotalMarketCulture], [t1].[VariantId], [t2].[Id] AS [Id1], [t2].[CreatedDate] AS [CreatedDate1], [t2].[KpiId] AS [KpiId0], [t2].[ModifiedDate] AS [ModifiedDate1], [t2].[Value], [t2].[VariantId] AS [VariantId0], [t4].[Id] AS [Id2], [t4].[Conversions] AS [Conversions0], [t4].[CreatedDate] AS [CreatedDate2], [t4].[KpiId] AS [KpiId1], [t4].[ModifiedDate] AS [ModifiedDate2], [t4].[Performance], [t4].[SelectedWeight], [t4].[VariantId] AS [VariantId1], [t4].[Weight] FROM [tblABVariant] AS [t0] LEFT JOIN [tblABKeyFinancialResult] AS [t1] ON [t0].[Id] = [t1].[VariantId] LEFT JOIN [tblABKeyValueResult] AS [t2] ON [t0].[Id] = [t2].[VariantId] LEFT JOIN [tblABKeyConversionResult] AS [t4] ON [t0].[Id] = [t4].[VariantId] ) AS [t3] ON [t].[Id] = [t3].[TestId] LEFT JOIN [tblABKeyPerformanceIndicator] AS [t5] ON [t].[Id] = [t5].[TestId] WHERE [t].[OriginalItemId] = @__originalItemId_0 ORDER BY [t].[Id], [t3].[Id], [t3].[Id0], [t3].[Id1], [t3].[Id2]

I attached a image of our logs to show the rate

Overview

thanks in advance.

johannesmols commented 3 months ago

I think we're having the same issue, with the site handing out timeouts left and right to all users when a test is ongoing. I can't find logs that confirm that it is the exact same issue, but it sure seems like it.

@BaloghDaniel did you find a solution to this?

BaloghDaniel commented 3 months ago

Hi @johannesmols

Unfortunately not. We ended up uninstalling the nuget package. Although, when solving another issue with performance i got an idea that the logging (serilog in our case) itself was the problem. Havent had the chance to try this yet, but i will try to disable the logging on debug and console level and then try a/b-testing again.

Would be very interesting to know if you to use serilog or not?

Thanks.

johannesmols commented 3 months ago

Hi @BaloghDaniel, thanks for the response. Interesting theory with the logging, I'm curious how you got that idea. We are using Serilog too, so I will definitely give disabling it a try too. Although it's tricky since we only have the issue in Prod, presumably because there aren't enough requests/s locally or on test to cause timeouts.

BaloghDaniel commented 3 months ago

We were looking into why episerver were slow in editmode and when we turned of logging on debug / console the loadtime for dojo.js went from <5 seconds to <0.3 seconds. Then i just thought (hoped) maybe this can solve the abtest to, i also think that i saw somewhere (Cant find it ) that someone had a problem with serilog getting caught in an infinityloop.

So its just a shot in the dark. @johannesmols