dgrijalva / jwt-go

ARCHIVE - Golang implementation of JSON Web Tokens (JWT). This project is now maintained at:
https://github.com/golang-jwt/jwt
MIT License
10.78k stars 994 forks source link

Running several SignedString(privateKey) in goroutines is almost the same as running in serial. #61

Closed wari closed 9 years ago

wari commented 9 years ago

Hi there, just a note, not sure if this would be a bug, or just a normal behaviour that's to be expected with signing tokens via RSxxx.

On my system, where I need to sign one or many token at once. When I sign a token, it may take roughly 200-250ms for jwt.SignedString(). Of course, depending on what machine I use, it may be faster or slower due to CPU.

But when I tried signing 3 different tokens at the same time, using the same privateKey, the results are almost 3x longer.

The source for my implementation in psuedocode-ish looks like this:

for item := range stuff {
  i := doSomethingWith(item)
  go func() {
    doBasicClaims()
    AddPayloadClaim(i)
    s := jwt.SignedString(privateKey)
    sendOffToken(s)
    getBackData()
  }
}

So for this, all the other stuff took only a few ms, but the goroutines will take a total of 600 to 900 ms on a request that produces 3 iterations.

Here's what the verbose log looks like

Started POST /settings for localhost:10000
Processing all stuff 915.559µs
Iterating through 3 settings request 933.57µs
Finished processing all stuff, and waiting for return 954.803µs (for range:)
Processing AAAA 989.179µs
Before token signing with RSA256 1.175348ms
Processing BBBB 11.132295ms
Before token signing with RSA256 11.341492ms
Processing CCCC 22.01571ms
Before token signing with RSA256 22.350366ms
Token signing for BBBB took 613.113249ms current time @ 624.455326ms
Sent Request for BBBB 624.532368ms
Token signing for AAAA took 658.879584ms current time @ 660.055904ms
Sent Request for AAAA 660.175747ms
Token signing for CCCC took 656.992226ms current time @ 679.343004ms
Sent Request for CCCC 679.402761ms
Got back reply from AAAA 684.077767ms travel time took 23.902779ms
Finished Processing AAAA 684.188976ms
Got back reply from CCCC 686.740498ms travel time took 7.338188ms
Finished Processing CCCC 686.815321ms
Got back reply from BBBB 767.306613ms travel time took 142.774982ms
Finished Processing BBBB 767.355212ms
Completed 200 OK in 767.547319ms: served 1511 bytes

So, the question is, is this expected behaviour that I'm observing? I would be afraid to do this on hundreds of simultaneous token creations with RSA.

Just a note, the getBackData() parts are actually tokens signed with HS256 over a network, and the slowest one that took 142ms (BBBB), is actually from a Raspberry Pi. Which means, it took less time to verify an RSA signature, and generate a SignedString() with HS256 than to actually sign with RSA on a faster machine.

wari commented 9 years ago

Oh, and also, the results are more or less the same with GOMAXPROCS set to 1 or 8, etc. Does not seem to matter at all.

dgrijalva commented 9 years ago

I just pushed 901c439dfe4a1c2f483c70ec0d703cc8219c9680 which benchmarks the HMAC signing method. It seems to behave as expected. Go ahead and run it for yourself and let me know what you see. I'm not familiar with the performance of parallel Go on arm chips. Could be something there?

Artax:jwt-go dgrijalva$ go test -bench . -cpu 1
PASS
BenchmarkHMACSigning       50000         41500 ns/op
ok      github.com/dgrijalva/jwt-go 3.619s
Artax:jwt-go dgrijalva$ go test -bench . -cpu 2
PASS
BenchmarkHMACSigning-2    100000         24001 ns/op
ok      github.com/dgrijalva/jwt-go 3.785s
Artax:jwt-go dgrijalva$ go test -bench . -cpu 4
PASS
BenchmarkHMACSigning-4    100000         14235 ns/op
ok      github.com/dgrijalva/jwt-go 2.723s
Artax:jwt-go dgrijalva$ go test -bench . -cpu 16
PASS
BenchmarkHMACSigning-16   100000         15301 ns/op
ok      github.com/dgrijalva/jwt-go 2.816s
dgrijalva commented 9 years ago
Artax:jwt-go dgrijalva$ go test -bench . -cpu 1 -benchtime 5s
PASS
BenchmarkHS256Signing    1000000          6329 ns/op
BenchmarkHS384Signing    1000000          7121 ns/op
BenchmarkHS512Signing    1000000          7725 ns/op
BenchmarkRS256Signing    5000000          3339 ns/op
BenchmarkRS384Signing    5000000          3331 ns/op
BenchmarkRS512Signing    5000000          3327 ns/op
ok      github.com/dgrijalva/jwt-go 82.747s
Artax:jwt-go dgrijalva$ go test -bench . -cpu 2 -benchtime 5s
PASS
BenchmarkHS256Signing-2  5000000          3692 ns/op
BenchmarkHS384Signing-2  2000000          4287 ns/op
BenchmarkHS512Signing-2  2000000          4494 ns/op
BenchmarkRS256Signing-2  5000000          2019 ns/op
BenchmarkRS384Signing-2  5000000          1975 ns/op
BenchmarkRS512Signing-2  5000000          1967 ns/op
ok      github.com/dgrijalva/jwt-go 85.383s
Artax:jwt-go dgrijalva$ go test -bench . -cpu 4 -benchtime 5s
PASS
BenchmarkHS256Signing-4  5000000          2716 ns/op
BenchmarkHS384Signing-4  5000000          3127 ns/op
BenchmarkHS512Signing-4  5000000          3150 ns/op
BenchmarkRS256Signing-4 10000000          1390 ns/op
BenchmarkRS384Signing-4 10000000          1399 ns/op
BenchmarkRS512Signing-4 10000000          1420 ns/op
ok      github.com/dgrijalva/jwt-go 101.527s
wari commented 9 years ago

Let me try this out. Will get back to you as soon as I can. Thanks

wari commented 9 years ago

Actually, your numbers are way off than mine. For this test I'm using a MacBook pro. The RS numbers should not look so fast (like the one you're having) IMO, or is there something wrong with my machine? Or it could be you have them fast accelerated crypto cards?:

jwt-go git:master ❯ go test -bench . -cpu 1 -benchtime 5s
PASS
BenchmarkHS256Signing    1000000          9059 ns/op
BenchmarkHS384Signing    1000000         10039 ns/op
BenchmarkHS512Signing    1000000         10403 ns/op
BenchmarkRS256Signing       1000       9082389 ns/op
BenchmarkRS384Signing       1000       9231708 ns/op
BenchmarkRS512Signing       1000       9142034 ns/op
ok      github.com/dgrijalva/jwt-go 65.712s
jwt-go git:master ❯ go test -bench . -cpu 2 -benchtime 5s
PASS
BenchmarkHS256Signing-2  1000000          5392 ns/op
BenchmarkHS384Signing-2  1000000          5938 ns/op
BenchmarkHS512Signing-2  1000000          6461 ns/op
BenchmarkRS256Signing-2     2000       4783875 ns/op
BenchmarkRS384Signing-2     2000       4811723 ns/op
BenchmarkRS512Signing-2     2000       4799514 ns/op
ok      github.com/dgrijalva/jwt-go 64.270s
jwt-go git:master ❯ go test -bench . -cpu 4 -benchtime 5s
PASS
BenchmarkHS256Signing-4  2000000          3518 ns/op
BenchmarkHS384Signing-4  2000000          4565 ns/op
BenchmarkHS512Signing-4  2000000          4604 ns/op
BenchmarkRS256Signing-4     2000       2965792 ns/op
BenchmarkRS384Signing-4     2000       2942990 ns/op
BenchmarkRS512Signing-4     2000       2941127 ns/op
ok      github.com/dgrijalva/jwt-go 58.278s

This is what the benchmark looked like on the server in question (intel, ubuntu):

wari@salt:~/src/github.com/dgrijalva/jwt-go$ go test -bench . -cpu 1 -benchtime 5s
PASS
BenchmarkHS256Signing     500000         17296 ns/op
BenchmarkHS384Signing     500000         19656 ns/op
BenchmarkHS512Signing     300000         21234 ns/op
BenchmarkRS256Signing        500      14924048 ns/op
BenchmarkRS384Signing        500      15312750 ns/op
BenchmarkRS512Signing        500      14867267 ns/op
ok      github.com/dgrijalva/jwt-go 67.362s
wari@salt:~/src/github.com/dgrijalva/jwt-go$ go test -bench . -cpu 2 -benchtime 5s
PASS
BenchmarkHS256Signing-2   500000         14936 ns/op
BenchmarkHS384Signing-2  1000000         15754 ns/op
BenchmarkHS512Signing-2   500000         17111 ns/op
BenchmarkRS256Signing-2      500      10069299 ns/op
BenchmarkRS384Signing-2      500      10971741 ns/op
BenchmarkRS512Signing-2      500      10811426 ns/op
ok      github.com/dgrijalva/jwt-go 56.678s
wari@salt:~/src/github.com/dgrijalva/jwt-go$ go test -bench . -cpu 4 -benchtime 5s
PASS
BenchmarkHS256Signing-4  1000000         11716 ns/op
BenchmarkHS384Signing-4   500000         14896 ns/op
BenchmarkHS512Signing-4   500000         15034 ns/op
BenchmarkRS256Signing-4     1000       7667207 ns/op
BenchmarkRS384Signing-4     1000       8193615 ns/op
BenchmarkRS512Signing-4     1000       7391248 ns/op
ok      github.com/dgrijalva/jwt-go 66.040s
wari commented 9 years ago

BTW, I've got my very informal test here at https://github.com/wari/jwt-testing which I believe is more "real world" at least in my use case.

So, what I gather from my own test is that GOMAXPROCS matters a lot. For 10 requests, I can shave off time to sign by half or a third. But one thing to note is that they all finish around the same time. If GOMAXPROCS is set to 1, if you sign 10 concurrently, or serially, the net effect is you will get the results at about the same time.

PS: Again, your numbers looks way off, and also I don't really understand the go benchmark :)

dgrijalva commented 9 years ago

The numbers I posted were wrong. The signing was failing and I wasn't checking for it. I fixed the test, but forgot to update the ticket.

-dave

On Apr 12, 2015, at 8:24 PM, Wari Wahab notifications@github.com wrote:

BTW, I've got my very informal test here at https://github.com/wari/jwt-testing which I believe is more "real world" at least in my use case.

So, what I gather from my own test is that GOMAXPROCS matters a lot. For 10 requests, I can shave off time to sign by half or a third. But one thing to note is that they all finish around the same time. If GOMAXPROCS is set to 1, if you sign 10 concurrently, or serially, the net effect is you will get the results at about the same time.

PS: Again, your numbers looks way off, and also I don't really understand the go benchmark :)

— Reply to this email directly or view it on GitHub.

dgrijalva commented 9 years ago

Looks like you are seeing the benefits of multiple CPUs here. Look at the ns/op values as you increase CPUs.

-dave

On Apr 12, 2015, at 7:33 PM, Wari Wahab notifications@github.com wrote:

Actually, your numbers are way off than mine. For this test I'm using a MacBook pro. The RS numbers should not look so fast (like the one you're having) IMO, or is there something wrong with my machine? Or it could be you have them fast accelerated crypto cards?:

jwt-go git:master ❯ go test -bench . -cpu 1 -benchtime 5s PASS BenchmarkHS256Signing 1000000 9059 ns/op BenchmarkHS384Signing 1000000 10039 ns/op BenchmarkHS512Signing 1000000 10403 ns/op BenchmarkRS256Signing 1000 9082389 ns/op BenchmarkRS384Signing 1000 9231708 ns/op BenchmarkRS512Signing 1000 9142034 ns/op ok github.com/dgrijalva/jwt-go 65.712s jwt-go git:master ❯ go test -bench . -cpu 2 -benchtime 5s PASS BenchmarkHS256Signing-2 1000000 5392 ns/op BenchmarkHS384Signing-2 1000000 5938 ns/op BenchmarkHS512Signing-2 1000000 6461 ns/op BenchmarkRS256Signing-2 2000 4783875 ns/op BenchmarkRS384Signing-2 2000 4811723 ns/op BenchmarkRS512Signing-2 2000 4799514 ns/op ok github.com/dgrijalva/jwt-go 64.270s jwt-go git:master ❯ go test -bench . -cpu 4 -benchtime 5s PASS BenchmarkHS256Signing-4 2000000 3518 ns/op BenchmarkHS384Signing-4 2000000 4565 ns/op BenchmarkHS512Signing-4 2000000 4604 ns/op BenchmarkRS256Signing-4 2000 2965792 ns/op BenchmarkRS384Signing-4 2000 2942990 ns/op BenchmarkRS512Signing-4 2000 2941127 ns/op ok github.com/dgrijalva/jwt-go 58.278s This is what the benchmark looked like on the server in question (intel, ubuntu):

wari@salt:~/src/github.com/dgrijalva/jwt-go$ go test -bench . -cpu 1 -benchtime 5s PASS BenchmarkHS256Signing 500000 17296 ns/op BenchmarkHS384Signing 500000 19656 ns/op BenchmarkHS512Signing 300000 21234 ns/op BenchmarkRS256Signing 500 14924048 ns/op BenchmarkRS384Signing 500 15312750 ns/op BenchmarkRS512Signing 500 14867267 ns/op ok github.com/dgrijalva/jwt-go 67.362s wari@salt:~/src/github.com/dgrijalva/jwt-go$ go test -bench . -cpu 2 -benchtime 5s PASS BenchmarkHS256Signing-2 500000 14936 ns/op BenchmarkHS384Signing-2 1000000 15754 ns/op BenchmarkHS512Signing-2 500000 17111 ns/op BenchmarkRS256Signing-2 500 10069299 ns/op BenchmarkRS384Signing-2 500 10971741 ns/op BenchmarkRS512Signing-2 500 10811426 ns/op ok github.com/dgrijalva/jwt-go 56.678s wari@salt:~/src/github.com/dgrijalva/jwt-go$ go test -bench . -cpu 4 -benchtime 5s PASS BenchmarkHS256Signing-4 1000000 11716 ns/op BenchmarkHS384Signing-4 500000 14896 ns/op BenchmarkHS512Signing-4 500000 15034 ns/op BenchmarkRS256Signing-4 1000 7667207 ns/op BenchmarkRS384Signing-4 1000 8193615 ns/op BenchmarkRS512Signing-4 1000 7391248 ns/op ok github.com/dgrijalva/jwt-go 66.040s — Reply to this email directly or view it on GitHub.

wari commented 9 years ago

Anyway, thanks a lot for the benchmark code, I now know to add GOMAXPROCS to my server, it is still very heavy in CPU usage, and if I do run into the problem of having loaded with too many calls at once I might use the HS256 signing instead of RSA signing.

Oh, and in my dumbness, the reason why I wrote in the first place is due to the fact that I used GOMAXPROC (without the 'S') to run.

Thanks again :smile: