Open sumeetchhetri opened 2 years ago
As I've mentioned in the other thread, we will be moving forward with Round 21. It may take an extra week for verification.
Here is a good start, thanks to @billywhizz
Other offending frameworks will be removed over the next 2 to 3 weeks of verification.
If you'd like to assist us in searching for frameworks that might be violating the updated clarification:
This is a description of the process as mentioned in this thread
once i have my PR ready i will do some further testing on other frameworks this week - is there some way you want to divide them up? let me know and i can fit in with whatever your plan is.
@billywhizz Thank you again for your help here. You can divide and conquer as you see fit. As I mentioned in the Round thread, I'm not worried about the bottom tier. Once we get to about ~50% of the performance of the fixed top frameworks, I'm willing to bet there's nobody using this. This is one of those cases where 20% effort will get us 99% of the way there. I'm interested to see what the top results are after the top frameworks are adjusted, and we'll continue down the list as far as we can go.
here's two more i tested yesterday to add to the list.
@billywhizz I think my verification strategy isn't good at detecting batching in single query and fortunes.
I send each HTTP request manually with curl
, but I bet that batching only occurs in those test types when the app receives many requests in parallel. How are you going about it?
hi @michaelhixson
i'm afraid my process is very manual! :smiley:
i just run the framework benchmark for the specific framework i am looking at. e.g.
./tfb --mode benchmark -type db --test just
when it is starting up, i watch the veth interfaces coming up using ifconfig and the second one is always the one between the web server and database docker instances. i then run a tcpdump like tool (tcpdump itself should be fine too) on that veth to dump the packets and inspect them manually, checking for the 'S' commands on every query on the wire. i had to make a few tweaks to TE toolset also to stop it pruning the docker images when the run is finished so i can rerun without a full rebuild.
from what i can see, you rarely see the commands coalesced on the wire, unless whatever pg library the framework in question is using allows this network optimization. it's usually one packet out and one packet back, so it's a bit painful to ensure the SYNCS are in place. once i have checked first few i assume the rest are ok.
i can have a look tomorrow and see if i could automate something if you don't have anything that's easy to use currently?
also, as you say, it should be pretty obvious looking at the packets when you see multiple commands in a single packet. then you need to scan it to ensure every bind and exec has a following sync. i'll see if i can come up with something you can run to do this automatically, but it might take a day or two.
Ah ok. You are sending many requests in parallel because you are running the full benchmark.
I was sending a single multi-query request so that the query log and tcpdump output would be small. I think that'd catch most non-compliant implementations in practice, and we'd mark that framework variant non-compliant across all test types (remove the entry from benchmark_config.json or tag it as "broken"). More granular verification would be better, though, so perhaps your approach is better and we just need to read some big logs.
If I wanted to stick with curl
, I suppose I could try sending requests in parallel with xargs
to detect batching in single-query test types.
A more automated approach would be nice even if it's not integrated directly with the toolset. Earlier I tried using postgres stats tables and that didn't work out. A contribution would be welcome although it may be a lot of work. We can tough it out with the current manual verification process for this round at least.
you could just use wrk instead of curl?
i'll have a look tomorrow and see if i can come up with anything that would make the process easier/automatable.
hi @michaelhixson. i have something here which should be able to act as a proxy between the web server and the database and keep track of all the messages on the wire. i should be able to package it up as a docker container that you could run to test various frameworks. i could submit as a PR for TE repo under toolset/ somewhere or as a separate repo if you don't want it in TE one.
let me know if you think would be useful if you haven't already figured out a solution. i should be able to have it ready for tomorrow AM if you interested.
currently the output looks something like this for /query?q=10, but i can do something to summarise results and count up the bind/exec/sync messages or verify that every bind/exec is followed by a sync.
@billywhizz That's certainly more readable than the tcpdump output I was looking at.
Could you push the code to a separate repository so we could look at it?
@billywhizz That's certainly more readable than the tcpdump output I was looking at.
Could you push the code to a separate repository so we could look at it?
@michaelhixson i have something basically working and integrated with the TE repo but, as ever, i have some unexpected issues to resolve. hopefully i can get them sorted this evening and will ping you as soon as i have something reliable you could use without much effort.
i've pushed the latest to my fork so you can have a look, but i have things i need to fix. as i say, hopefully will have it working soon.
@michaelhixson am afraid i am failing miserably in my task here. dealing with lots of weird network behaviour from different frameworks. so, i think manual test is only way to go for now. i will keep plugging away on this anyway as i want to get it working. if i come up with something reliable over weekend i will let you know!
@billywhizz ok no problem! Thanks for looking into it.
@michaelhixson @nbrady-techempower i have finally got this proxy tool working to verify the postgres pipelining. it's checked into my fork here. i can do a PR if you like or you can just clone from my repo for now and give it a whirl.
to run, just add "--proxy on" to the command line arguments of the test when doing the verification run.
e.g. for a verification of a single framework with all tests
./tfb --mode verify --test just --proxy on
or for a single test
./tfb --mode verify --test just --proxy on --type db
you should get output at the end of the test like this
the results tell you the total number of Query, Bind, Exec and Sync commands for each database connection and the "Miss" field in results tells you number of Exec commands that were not directly followed by a Sync command. The status will display "ok" in green if no Sync commands were missed and if total number of Sync commands is greater than or equal to total number of Exec commands, otherwise you will see "fail" in red.
The proxy code is a bit of a mess as i just hacked on it until i got it working. I'll have a look in coming days to see if i can clean it up some, but i have tested against top 20 or so frameworks without any errors. I'll do a run through them all here later on Sunday and will collate what I find and post it up here.
Hopefully it's useful and saves you some time and, if you merge it, it might also be useful for framework maintainers so they can do this verification themselves before submitting PRs.
@michaelhixson @nbrady-techempower
i have written a little script to automate running the verification and have run the verification for the top 20 or so frameworks. there is a gist here which you can clone and follow the instructions to run this yourself.
you can see the results here
i'll make some more improvements and try to investigate some of the unusual results over next couple of days. feel free to ping me if you want to discuss or want me to create a PR to have this verification included in FrameworkBenchmarks repo.
@billywhizz This is fantastic! Absolutely would love to have this in the toolset. I know we said to hold off because of the new verifier, but that's taking longer than expected with client-facing work in-house.
@nbrady-techempower great! i have some spare time at moment so i will tidy things up later today and get a PR ready.
i will also do some more detailed investigation on the frameworks whose results are a bit confusing and will report back on anything i find.
@nbrady-techempower @michaelhixson I have opened a PR here for the new Postgres protocol verification tool.
@an-tao I also re-ran verification for drogon with the latest PR merged and it is now passing all the verification tests. :partying_face:
The gist with the script for automatically running the verifications and producing a report has been updated here. You can also see the latest report from my local run here.
To summarise, out of top 20 or so frameworks, the only one now failing in a serious way is beetlex, so maybe @beetlex-io can take a look at why that might be happening?
There are a number of frameworks that seem to be failing the verification only on the update test and they all show the exact same results (10728 missing syncs). This could be down to a bug in my verification tool or possibly even a bug in some underlying library like libpg that they might all be using. I will do some further digging to see why these are showing up as failures. To save you time, these are the frameworks showing this failure at moment:
There are a number of frameworks that seem to be failing the verification only on the update test and they all show the exact same results (10728 missing syncs).
Is it because they're batching the UPDATE statements? That's allowed [*], whereas batching the SELECT statements for that test is not.
For example, for a request like /updates?count=20
, I would expect a compliant implementation to issue 20(B+E+S) for the SELECT part, and then additionally: either (a) 1(B+E+S) if using a single bulk UPDATE statement, (b) 20(B+E+S) if using non-batched UPDATE statements, or (c) 20(B+E) + 1(S) if using a batch of UPDATE statements.
Does your tool monitor a known number of requests? If so, perhaps it could calculate those three different sets of expected B, E, S counts and verify it's (close to) one of those, or in that range? Not sure that's feasible.
The tool would still be useful even if we couldn't use it on the updates test.
[*] I don't think our rules mention this explicitly, but each batch of UPDATE statements should be confined to one request. I have a feeling we can't verify that effectively and that no one is doing cross-request update batching anyway.
@michaelhixson that makes sense i think, so you are saying, for the frameworks that don't use a single bulk SQL statement to do all the updates, it is acceptable to batch together 20 Bind + Execs with a single Sync? My verification tool will flag that as a failure for sure if so.
I'll see if i can modify the verification to allow this behaviour and re-run tests.
if you are correct about this behaviour then i think we are pretty close to saying all those frameworks on my list are now compliant.
Also, re. beetlex, @beetlex-io - it seems there are only 3 missing syncs on all tests so this could be some weird edge case I am not handling - i don't think it will require changes on your end as the overall numbers match up exactly except for those 3 missing syncs on every test.
so you are saying, for the frameworks that don't use a single bulk SQL statement to do all the updates, it is acceptable to batch together 20 Bind + Execs with a single Sync?
Right. I looked briefly at the code for a couple on your list, and this is the approach they take.
IIRC this approach is slower than one bulk update statement (in postgres at least), but it's not against the rules.
@michaelhixson that makes sense then - i thought it was odd so many were showing the exact same issue. i'll see if i can adjust the verifier to handle this. thanks!
so, it looks like all frameworks on that list (barring the minor issue with beetlex) are compliant, assuming the ones currently showing update failures are all using this technique. :partying_face: :champagne:
just an update - i have checked the code for all the frameworks mentioned above and they are all using the technique mentioned by @michaelhixson of batching together the Bind/Execs with a single Sync for the update query. I'm still seeing if i can automatically verify this without a huge amount of work but I think is safe to say all the frameworks on my list are now compliant with the postgres pipelining requirement.
@nbrady-techempower @michaelhixson FYI - i got redkale to run on my local tests and it seems to not be compliant with the pipelining requirements.
i also checked the query test manually and verified the behaviour is non-compliant across redkale, redkale-native and redkale-graalvm frameworks.
i already flagged this up in the previous thread but there has been no response as yet from @redkale and this framework is the only one, along with drogon (which seems to have been patched since), to show the outsize scores on the latest test run.
@redkale - can you submit a PR to remove the pipelining please?
i have a couple of tweaks to push to my verifier/proxy PR later and hopefully i can run through all the tests again.
@billywhizz I updated pg client
@redkale How did you do this without making a PR?
Does validation run on CI runs? Because if external dependencies can be updated without submitting a PR we might miss some regressions. (floating dependencies, local GitHub repos clones)
@redkale @nbrady-techempower @sebastienros
i just ran the three redkale configuations against my local copy of TE master branch and got the following result with the verifier.
i then deleted the docker images (i had disabled the pruning so i could avoid rebuilding on every run) and re-ran with freshly built docker images and got the following result.
so, something has indeed changed even though no PR was submitted. i'm not expert in java release management so i can only guess that a dependency has been pushed to maven which has the change included? technically, this should not be allowed. of course it would be possible to do for various frameworks to force push dependencies with same version numbers but it has been stated explicitly on more than one occasion that this is not allowed and the code checked into TE repo must be pinned to specific releases in order to avoid people pushing fixes/updates "by stealth".
it also seems the redkale-graalvm configuration is still pipelining so not sure what is going on there. :man_shrugging:
just to be clear, am not suggesting anything underhand from @redkale - i assume it's an honest misunderstanding of the rules/conventions around releasing new versions of frameworks.
@redkale would you be able to clarify re. these questions?
many thanks. :pray:
here are the logs from my clean build of the redkale docker image - it might make some sense to folks with more experience of java/maven release management.
redkale: Step 1/8 : FROM maven:3.8.4-openjdk-17-slim as maven
redkale: ---> 849a2a2d4242
redkale: Step 2/8 : WORKDIR /redkale
redkale: ---> Running in 1600c7ce1aa3
redkale: Removing intermediate container 1600c7ce1aa3
redkale: ---> c5216fc4e58d
redkale: Step 3/8 : COPY src src
redkale: ---> 19161579e34a
redkale: Step 4/8 : COPY conf conf
redkale: ---> 40436b911a46
redkale: Step 5/8 : COPY pom.xml pom.xml
redkale: ---> 58a77bb4c5d0
redkale: Step 6/8 : RUN mvn package -q
redkale: ---> Running in 2737d6c7c3b5
redkale: 2022-06-10 18:24:27.752 INFO org.redkale.boot.Application <init>
redkale: -------------------------------- Redkale 2.7.0 --------------------------------
redkale: 2022-06-10 18:24:27.791 INFO org.redkale.boot.Application init
redkale: APP_OS = Linux 5.4.0-113-generic amd64
redkale: APP_JAVA = OpenJDK Runtime Environment 17.0.2+8-86
redkale: APP_PID = 6
redkale: APP_NODEID = 0
redkale: APP_LOADER = RedkaleClassLoader
redkale: APP_ADDR = 172.17.0.2:8585
redkale: APP_HOME = /redkale
redkale: APP_CONF_DIR = file:/redkale/conf/
redkale: 2022-06-10 18:24:28.515 INFO org.redkale.boot.NodeServer init
redkale: NodeHttpServer load filter class in 575 ms
redkale: 2022-06-10 18:24:28.558 INFO org.redkale.boot.Application loadDataSource
redkale: [Redkale-HTTP:8080-Thread] Load DataSource resourceName = , source = PgsqlDataSource{}
redkale: 2022-06-10 18:24:28.562 INFO org.redkale.boot.NodeServer loadService
redkale: [Redkale-HTTP:8080-Thread] LocalService (type= org.redkalex.benchmark.BenchmarkService , name='') load
redkale: [Redkale-HTTP:8080-Thread] All Services load cost 46 ms
redkale: 2022-06-10 18:24:28.660 INFO org.redkale.boot.NodeHttpServer loadHttpServlet
redkale: [Redkale-HTTP:8080-Thread] Load RestDynServlet (type=org.redkalex.benchmark.BenchmarkService, name='') mapping to [/cached-worlds, /db, /fortunes, /json, /plaintext, /queries, /updates]
redkale: [Redkale-HTTP:8080-Thread] All HttpServlets load cost 95 ms
redkale: 2022-06-10 18:24:28.663 INFO org.redkale.boot.Application start
redkale: -------------------------- Redkale started in 935 ms --------------------------
redkale: 2022-06-10 18:24:28.739 INFO org.redkale.net.Server shutdown
redkale: HttpServer-TCP shutdowning
redkale: 2022-06-10 18:24:28.740 INFO org.redkale.net.Server shutdown
redkale: HttpServer-TCP shutdow prepare servlet
redkale: 2022-06-10 18:24:28.746 INFO org.redkale.net.Server shutdown
redkale: HttpServer-TCP shutdown in 2 ms
redkale: 2022-06-10 18:24:28.748 INFO org.redkale.boot.Application shutdown
redkale: PgsqlDataSource{} destroy in 0 ms
redkale: 2022-06-10 18:24:28.752 INFO org.redkale.boot.Application shutdown
redkale: AsyncGroup destroy in 3 ms
redkale: Removing intermediate container 2737d6c7c3b5
redkale: ---> 6d54e7c4b178
redkale: Step 7/8 : EXPOSE 8080
redkale: ---> Running in 77d3d1cf63e8
redkale: Removing intermediate container 77d3d1cf63e8
redkale: ---> d17242256a31
redkale: Step 8/8 : CMD ["java", "-server", "-XX:+UseNUMA", "-XX:+UseParallelGC", "-XX:AutoBoxCacheMax=80000", "-DAPP_HOME=./", "-jar", "/redkale/target/redkale-benchmark-1.0.0.jar"]
redkale: ---> Running in 893579ef203d
redkale: Removing intermediate container 893579ef203d
redkale: ---> 1e478befca0c
redkale: [Warning] One or more build-args [BENCHMARK_ENV TFB_TEST_DATABASE TFB_TEST_NAME] were not consumed
redkale: Successfully built 1e478befca0c
redkale: Successfully tagged techempower/tfb.test.redkale:latest
redkale: Build time: 1m 5s
redkale: 2022-06-10 18:24:46.362 INFO org.redkale.boot.Application <init>
redkale: -------------------------------- Redkale 2.7.0 --------------------------------
redkale: 2022-06-10 18:24:46.426 INFO org.redkale.boot.Application init
redkale: APP_OS = Linux 5.4.0-113-generic amd64
redkale: APP_JAVA = OpenJDK Runtime Environment 17.0.2+8-86
redkale: APP_PID = 7
redkale: APP_NODEID = 0
redkale: APP_LOADER = RedkaleCacheClassLoader
redkale: APP_ADDR = 172.19.0.5:8585
redkale: APP_HOME = /redkale
redkale: APP_CONF_DIR = file:/redkale/conf/
redkale: 2022-06-10 18:24:46.612 INFO org.redkale.boot.NodeServer init
redkale: NodeHttpServer load filter class in 55 ms
redkale: 2022-06-10 18:24:46.671 INFO org.redkale.boot.Application loadDataSource
redkale: [Redkale-HTTP:8080-Thread] Load DataSource resourceName = , source = PgsqlDataSource{url=jdbc:postgresql://tfb-database:5432/hello_world}
redkale: 2022-06-10 18:24:46.769 INFO org.redkale.boot.NodeServer loadService
redkale: [Redkale-HTTP:8080-Thread] LocalService (type= org.redkalex.benchmark.BenchmarkService , name='') load and init in 94 ms
redkale: [Redkale-HTTP:8080-Thread] All Services load cost 157 ms
redkale: 2022-06-10 18:24:46.800 INFO org.redkale.boot.NodeHttpServer loadHttpServlet
redkale: [Redkale-HTTP:8080-Thread] Load RestDynServlet (type=org.redkalex.benchmark.BenchmarkService, name='') mapping to [/cached-worlds, /db, /fortunes, /json, /plaintext, /queries, /updates]
redkale: [Redkale-HTTP:8080-Thread] All HttpServlets load cost 27 ms
redkale: 2022-06-10 18:24:46.822 INFO org.redkale.net.Server start
redkale: [Redkale-HTTP:8080-Thread] HttpServer listen: 0.0.0.0:8080, cpu: 8, responsePoolSize: 1024, bufferPoolSize: 64, bufferCapacity: 32K, maxbody: 64K, started in 498 ms
redkale: 2022-06-10 18:24:46.823 INFO org.redkale.boot.Application start
redkale: -------------------------- Redkale started in 505 ms --------------------------
i can't see any recently updated repos here on redkale github.
redkale maven plugin seems to have been last updated in oct 2021.
here is the redkale pom.xml from TE repo.
@redkale @nbrady-techempower i just re-ran redkale-graalvm with a full docker cleanup of all intermediate images and it does now also seem to be passing verification.
but the questions for @redkale from above are still valid i think:
@nbrady-techempower @michaelhixson one thing i have run into in this verification is the fact that siege always seems to send more requests than it is asked to. for example, in the /db test i always see 516 or 517 requests instead of the 512 requested. have you seen this behaviour? if so, any idea on how to mitigate it? i have been searching online for answers fruitlessly. :man_shrugging:
@billywhizz Hmm I'm not sure. This comment in our toolset indicates that siege results in a reliable number of queries. We tolerate a range for some of the related counts, though. Would that strategy work here? That is, consider a count valid if it's within 1-2% of the expected value?
@michaelhixson the behaviour is not making much sense to me tbh. i see siege reporting 512 transactions in output but i see 516 requests received on http server consistently. still haven't figured out why.
calculating a margin of error won't help me much as i am trying to find a tricky way to verify the update queries without having to do lots of complex logic keeping tracking of all the inflight select/update queries. thanks for getting back. i'll keep plugging away! :smiling_face_with_tear:
@redkale @nbrady-techempower i just re-ran redkale-graalvm with a full docker cleanup of all intermediate images and it does now also seem to be passing verification.
but the questions for @redkale from above are still valid i think:
- how did you push a new version of dependencies without submitting a PR which was reviewed by the TE team?
- when did you do the push? it seems it was since the last run where redkale is still showing the outsize scores which seem to be using postgres pipelining?
@b I am a Chinese. Due to national policies, our access to GitHub is very slow and sometimes we can't access it. Therefore, I put the redkale code on gitee (Chinese version of GitHub), https://gitee.com/redkale/redkale After each release of the official version, it will be synchronized to GitHub. Redkale depends on the Maven snapshot, https://www.tutorialspoint.com/maven/maven_snapshots.htm , it is impossible to release the official version for each verification and minor adjustment.
@billywhizz https://oss.sonatype.org/content/repositories/snapshots/org/redkale/redkale/2.7.0-SNAPSHOT/ Here you can see the update time, Fri Jun 10 16:24:03 UTC 2022, the last one is the lasted
thanks for the update @redkale! sorry for bugging you - am just trying to make sure you don't get excluded from the latest round. :pray:
i'll leave it up to the TE guys re. the policy of pushing updates but if this change you have pushed removes pipelining i think all should be ok for inclusion in the latest round.
@redkale @nbrady-techempower @michaelhixson this seems to be the change that ensures a sync with every bind/exec in the pg extended protocol. thanks again @redkale! https://gitee.com/redkale/redkale-plugins/commit/71fbe8b109080283c8be976476c2b270944fa6ec#c9de9ebf5c1b8a8cd25c4132489eec284a29579e_166_166
@nbrady-techempower @michaelhixson
any update on the plan for next round? as far as i understood further changes and updates were not allowed while testing was continuing, but it seems multiple frameworks have recently had minor tweaks and changes merged. https://github.com/TechEmpower/FrameworkBenchmarks/pulls?q=is%3Apr+is%3Aclosed
hmm. it seems these are all automated dependabot updates to dependencies. should these not be turned off while testing/verification for next round is ongoing?
@nbrady-techempower @michaelhixson
these automated dependabot updates also raise another issue. currently, as far as i understand, i am required to release a framework pinned to specific releases of any dependencies and submit a PR when i want to upgrade those dependencies, but these frameworks get automatically updated by dependabot. this doesn't seem right/fair no? i would like to be able to do same with just-js where the dependencies could be updated with submitting a PR - is this acceptable?
@billywhizz Dependabot updates are not for simple version bumps; they're only for vulnerabilities. In fact, I merge almost all dependabot updates in even if the framework doesn't pass, because I don't want vulnerable representations in the repository that somebody might copy to start a project. Hopefully that helps clear that up.
thats' fine @nbrady-techempower. thanks for clearing up - sorry for bugging you!
No worries! Thanks for your patience while we get this new round out with the new team!
if you need me to do any further testing here let me know. i have got my proxy tool into a reasonable state now but still so many edge cases in framework behaviour it doesn't handle easily so not sure is worth a PR right now. but i'm happy to test things one by one here if you need an extra pair of hands.
@billywhizz I don't want to burden you, but if you have the chance to do another round of testing with your tool, that'd be great! I'd like to start the round 21 official run by Wednesday and just want to make sure we caught everything. Might not be able to get around to testing myself for a couple of days.
hi @nbrady-techempower. i don't have much time this week but might be able to do a run through over the weekend. from my testing so far, all the top 20 or so frameworks appear to be compliant. i.e. everything on this list is fine - i checked the update failures and they were all due to allowed behaviour we discussed previously.
Thanks @billywhizz! I'm going to start the run today and we can always remove things from the results if they're not compliant.
@pavelmash FYI mORMot (mormot-postgres-raw
) is breaking the rules in the database queries and updates tests - refer to the discussion here.
P.S. Since it appears that mORMot is using libpq
, I can be more specific - there must be a call to PQpipelineSync()
after every single query in the pipeline.
Thanks @volyrique - actually I miss what a note about Sync is added into General test requirement #7 on Mar 8, 2022. mormot-postgres-raw
pipelining implementation will be fixed ASAP
mormot test for PG pipelining now uses Bind->Exec->Sync as required by General Rule 7 - see #7926. Thanks again for pointing.
We need to have a definitive list of all frameworks using postgresql (pipelining/batching) directly or indirectly, there are multiple frameworks who are relying on external libraries (rust/java/.net/possibly-other-langs) which are internally relying on pipelining.
Who would we be relying on to provide such a list, do we know what it takes to find all the offenders to this new rule?
IMHO the first step should be to, clearly identify the frameworks not adhering to the new spec, probably once such a list is officially available then we would be in a position to get this change ready for Round 21.
Until there is such a list Round 21 should be postponed indefinitely.