Open owensdj opened 7 months ago
hey @owensdj, thanks for this! Could you change the _onhold
suffix to _bench
? This way it will be automatically run by the CI. You'll also need to re-generate the CI workflow with ./generate_ci.sh >.github/workflows/build.yml
, see https://github.com/LesnyRumcajs/grpc_bench/pull/364#issuecomment-1618984958
I changed the suffix to _bench and re-generated the CI workflow. Hopefully I did it correctly.
Any idea why it's failing? I kept the original Dockerfile for Dart other than bumping the Dart image version to 3.3.4. Maybe a problem with the Dockerfile?
Any idea why it's failing? I kept the original Dockerfile for Dart other than bumping the Dart image version to 3.3.4. Maybe a problem with the Dockerfile?
Hey, I think the script to generate the CI is not portable and is acting funny on Windows (I have no means of testing that...). Could you please try switching the order of entries for the generated job? That is, dart
after d
.
I ran the script to generate the CI in Debian running in a VM rather than on Windows, so that shouldn't be an issue. I'll look and see if I can switch the order of the entries. Thanks.
I ran the script to generate the CI in Debian running in a VM rather than on Windows, so that shouldn't be an issue. I'll look and see if I can switch the order of the entries. Thanks.
If you could add all the details of your environment, including bash
and sort
version and put it into an issue, that'd be great. All in all, the output of the generate_ci.sh
should be deterministic, apparently it isn't - seems like the sort
isn't sorting entries identically (or there's something else going on).
I ran the script to re-generate the CI workflow under Windows Subsystem for Linux instead of Debian and the build.yml is different now, for some reason. I created a new commit for it.
@owensdj Looking at the CI, it fails to build. Could you please take a look?
Where is the protobuf file the CI is using? Maybe I used a different version of it. I didn't see it in the repo so I used the helloworld.proto file from the grpc examples page.
Where is the protobuf file the CI is using? Maybe I used a different version of it. I didn't see it in the repo so I used the helloworld.proto file from the grpc examples page.
The protobuf for the general case is here https://github.com/LesnyRumcajs/grpc_bench/tree/master/scenarios/complex_proto
The command to build in your case is ./build.sh dart_grpc_bench
, which will copy the protobuf and build the image.
The CI logs are a bit obscure, but in general this one means that the gRPC server crashed. Did you try to run the built one with ./bench.sh dart_grpc_bench
? Possibly in a loop. The server proposed in another PR suffers from random crashes, perhaps it's the same story here? https://github.com/LesnyRumcajs/grpc_bench/pull/364#issuecomment-1935979213
I did ./build.sh dart_grpc_bench and the server seems to run in the container with no problems. I didn't try doing any gRPC requests. Is there a script I can run that will do that just with the dart_grpc_bench?
I did ./build.sh dart_grpc_bench and the server seems to run in the container with no problems. I didn't try doing any gRPC requests. Is there a script I can run that will do that just with the dart_grpc_bench?
./bench.sh dart_grpc_bench
OK I'm seeing the error here as well. The first time I ran the bench it completed successfully. After that it got the Connection Refused error.
I'll look at the Dart code and see what I can do. I did notice it makes the mistake of starting one too many server threads when GRPC_SERVER_CPUS is 2 or higher.
After working with this for a while I finally figured out where the problem was all along. I made a simple client program to test sending requests to the server and it worked fine. When I put in the full complex proto payload request data none of the replies came back to the client.
By commenting them out I narrowed down the problem to just the f field of the Hello message. This is a float in the proto file and gets turned into a double type in Dart because Dart doesn't have a 32-bit float type. This should work because double is a 64-bit floating point value, but for some reason gRPC isn't making the conversion correctly.
I don't know if this is a bug in the protoc code generation or a bug in the Dart gRPC package. I'll be looking into that now.
Another update. I think the problem went away when I added the fixnum package to the server's pubspec.yaml. Now the server works perfectly using my client test code. I did 600K requests and replies from 6 concurrent instances with all 600K replies coming back. That's with running the server container manually in Docker after running build.sh.
I still get zero replies running bench.sh. I still don't understand where the problem is. It just doesn't make sense that it works perfectly for my test client using the same proto file and same Hello data but not for the bench script.
@owensdj Did you manage to resolve the issue? Or do we abandon the efforts here?
@LesnyRumcajs I looked at this and still don't understand why there is a problem. I made a simple client that sends the payload request message and it works fine with the Dart gRPC server code. 100K requests on 6 threads and all 600K responses come back. Zero errors.
The same server code crashes running the bench script. It gets an unhandled exception "Null check operator used on a null value" from the gRPC package code. It's like the gRPC package is getting a null value where it's asserting there can't be a null.
I'm thinking of asking the Dart community on Discord and/or Reddit to see if they have any ideas.
@LesnyRumcajs This is looking like a bug in the Dart gRPC package. Could you post or point me to the code that sends the gRPC request? That might help the Dart community figure out why the exception happens with the bench RPC call.
@LesnyRumcajs This is looking like a bug in the Dart gRPC package. Could you post or point me to the code that sends the gRPC request? That might help the Dart community figure out why the exception happens with the bench RPC call.
@owensdj gRPC calls are handled exclusively by the ghz, see: https://github.com/LesnyRumcajs/grpc_bench/blob/38875a7e550f7bf6a8bebe57aea3f6da4086652e/bench.sh#L103-L118
How ghz
does it is a more complex question that the maintainer(s) would be better suited to answer.
Summary of changes Changes introduced in this pull request:
Reference issue to close (if applicable) Fixes #146
Other information and links