TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project
https://www.techempower.com/benchmarks/
Other
7.59k stars 1.94k forks source link

Enhancement of the verification procedure #5222

Open jcheron opened 4 years ago

jcheron commented 4 years ago

The objective is to identify possible improvements of the new verification procedure:

Too strict checking for db queries?

Sometimes the test fails for 1 or 2 missing db queries (out of 20480 on Update test), while the other statistics are within the acceptance margin. This is the case for the following frameworks:

As the statistics on the number of db queries seemed reliable, we had not accepted a margin on this number.

I'd be interested to know why there are 1 or 2 queries missing... Perhaps we can add a margin of acceptance also on queries for these cases?

Failures due to the internal processing of the Framework

The case has already been identified for java/hibernate, which caches queries already made (due to the random aspect of the test), which distorted the results for the Query and Update tests. The solution was to randomly select unique numbers.

It is possible that other frameworks, java or not, may be in this case, I think it's probably the same for the following frameworks:

zloster commented 4 years ago
  1. About Hibernate: AFAIR most of the Java projects that are using Hibernate were updated to use the JPA interfaces. If the project is on Hibernate version 5.4.1 this should be true. Hibernate and EclipseLink are implementations of the JPA interface/standard. I see that Java/act with EclipseLink doesn't have a problem. I haven't inspected the code to check what's going on there.

So it should be possible and relatively easy the projects to be migrated to EclipseLink* as JPA provider. I don't see this option to be mentioned in the previous discussions.

*I haven't done such migration.

Also Hibernate has the StatelessSession but it is not possible to use it with the JPA interfaces. The difference from the normal Session is exactly the absence of Level 1 object cache. It's an option but I'll not advise for it.

  1. About the usage of 512 concurrency: Updated: (see here) Currently MySQL has a weird default configuration activated. If there are 200 or more pending writing transactions it will throw a TransactionDeadlockException. You will stumble upon on such examples. So far I've tried to lower the maximum connections from the connection pool but these exception are still there (the grizzly-jersey and undertow-jersey). How the code manages to create 200 pending transactions from 192 connections is another discussion. Only Java/act-eclipselink-mysql and Go/fasthttp frameworks are not throwing such an error. The results are from the run before the stricter validation. Added Another case is this: false positive in case the framework is configured in a particular way. https://github.com/TechEmpower/FrameworkBenchmarks/pull/5225#issuecomment-553337876

Most probably this default behaviour should be disabled.

jcheron commented 4 years ago

I finally reconsider my position about failed tests for 1 or 2 missing db queries. I think it would be wrong to add a margin of acceptance at this time that would mask this issue.

For having modified the code of some frameworks so that they can pass the tests again, for the moment, it appears that the failures, even marginal, are due to real causes. They therefore help to improve the code of benchmarks.

The case of Symfony is for the moment one of the most revealing, it was simply missing 1 or 2 db queries on the Update test. The flush of the 20 updates in a single transaction was certainly the cause (to be confirmed in the next runs).

What do you think, @nbrady-techempower ?

NateBrady23 commented 4 years ago

@zloster What do you think about setting innodb_table_locks = 0 with innodb-lock-wait-timeout? That would resolve the 200 or more pending transactions issue. The question is, what would be the difference in the test when an actual deadlock occurs.

@jcheron I'm inclined to agree with you here. Let's keep it as is for now. It's helping discover real issues and that's a huge win.

zloster commented 4 years ago

@nbrady-techempower My understanding is that innodb_deadlock_detect should be OFF. Also innodb_lock_wait_timeout should have the same value as PostgreSQL. For the current very short burst for benchmarking IMO the range should be 15-30 seconds. This way we should get fairly early an exception.

jderusse commented 4 years ago

We figured out in Symfony how to fix the deadLock issue (for the record, deadlock occurs when a process 1 updates A then B and process 2 updates B then A). With 512 currencies of 20 queries, there are a non negligible probability to update the same rows in a different order.

To solve that, symfony orders updates by id (#5237)

jderusse commented 4 years ago

Regarding the initial issue Too strict checking for db queries? There is an issue with some (too optimized) ORM. For instance by default PHP Doctrine ORM don't execute update query if nothing is changed. With 512 concurent request and 20 updates (total = 10 240 query), when changing the value with an 5 digit random, there is a non-negligible chance to pick the same number and don't trigger the update (65% chance to have 1 update that won't change).

In my opinion the Rules are respected, but the verify is too strict. If the value did not changed, a SQL query don't have to be executed. I suggest to allow 1/10000 request or allowing 7 digits random value in order to reduces the collision probability to 0.2%

jcheron commented 4 years ago

Doctrine is certainly optimized, which caused the 1-cache problem, but the existing code for Doctrine-Symfony Update test did not properly manage concurrent access to the database (deadlocks), which is now solved, in part due to this missing margin of error.

The difficulty with a more flexible check is that the case of deadlocks generating just a few missing queries would go unnoticed. And perhaps other phenomena, not yet identified, may never be so.

zloster commented 4 years ago

@nbrady-techempower @jderusse I stand corrected. I've updated the post.

NateBrady23 commented 4 years ago

Thanks everyone for putting so much effort into this!

zloster commented 4 years ago

@nbrady-techempower @jcheron I've get an error when I run vagrant@TFB-all:~$ tfb --mode verify --test dropwizard-postgres --concurrency-levels 128.

Verifying test fortune for dropwizard-postgres caused an exception: can't multiply sequence by non-int of type 'float'
   FAIL for http://tfb-server:9090
     Caused Exception in TFB
                 This almost certainly means your return value is incorrect,
                 but also that you have found a bug. Please submit an issue
                 including this message: can't multiply sequence by non-int of type 'float'
     Traceback (most recent call last):
       File "/FrameworkBenchmarks/toolset/benchmark/framework_test.py", line 113, in verify_type
         results = test.verify(base_url)
       File "/FrameworkBenchmarks/toolset/benchmark/test_types/fortune_type.py", line 47, in verify
         problems += verify_queries_count(self, "fortune", url, concurrency, repetitions, expected_queries, expected_rows)
       File "/FrameworkBenchmarks/toolset/benchmark/test_types/verifications.py", line 433, in verify_queries_count
         problems.append(display_queries_count_result(queries * queries_margin, expected_queries, queries, "executed queries", url))
       File "/FrameworkBenchmarks/toolset/benchmark/test_types/verifications.py", line 451, in display_queries_count_result
         if result > expected_result * 1.05:
TypeError: can't multiply sequence by non-int of type 'float'

The whole console log is in this Github gist.

zloster commented 4 years ago

@jcheron @nbrady-techempower Is it possible to add some form of delay before the verification queries are executed. For example some command-line option with default '0 seconds'. I'm having a hard time finding out why the dropwizard/PostgreSQL variants are not able to pass the verifications.

NateBrady23 commented 4 years ago

What about just starting with —mode debug so you can send your own requests?

zloster commented 4 years ago

With -mode debug and manual execution of the extracted-from-the-code reporting query (sort of) I don't get discrepancies.

NateBrady23 commented 4 years ago

@zloster I’m on mobile right now so I can’t verify. Can you try putting —concurrency-levels [128] (as a list)? If that’s the issue I’ll update the toolset tomorrow to fix that.

zloster commented 4 years ago

@nbrady-techempower I think we have misunderstanding - the change about concurrency-levels is OK and working. My request is about adding some sort of configurable delay before execution of the verification SQL queries that gather the statistics for the number of executed DB operations.

NateBrady23 commented 4 years ago

Oh, sorry! I was also reading the post before and didn’t realize it was from November. But, sure, I can take a look at that tomorrow for you.