dasch-swiss / dsp-api

DaSCH Service Platform API
http://admin.dasch.swiss
Apache License 2.0
74 stars 18 forks source link

Some Sipi-to-Knora connections fail under load #298

Open tobiasschweizer opened 7 years ago

tobiasschweizer commented 7 years ago

Since the update of Akka some days ago, some of the Sipi round trips to Knora fail when Knora is asking Sipi for a lot of thumbnails (ca. 400).

The error is caused (or - at least - this is my interpretation) by Knora not answering Sipis request in the pre_flight request (defined in sipi.init-knora.lua):

Request to Knora failed: Error #1175: Connection dropped by peer!

Is that a configuration issue? Is the amount of requests Knora will accept from one location limited?

We are trying to replace the round trip anyway, but I think we should investigate.

benjamingeer commented 7 years ago

Can you try again with the current develop branch?

tobiasschweizer commented 7 years ago

still get the error ...

benjamingeer commented 7 years ago

I can't reproduce this. Can you provide more detailed instructions, and preferably a script, for reproducing it?

tobiasschweizer commented 7 years ago

I was just loading books using SALSAH and them sometimes I got a 401 because Sipi could not connect to Knora.

benjamingeer commented 7 years ago

What do you mean by loading books? Can you please give me precise instructions for reproducing this?

tobiasschweizer commented 7 years ago

I did an extended search for all books in the SALSAH GUI and opened the Narrenschiff or another book.

Then when you open the network console of your developer tools in the browser you can see which of the requests to Sipi fail.

tobiasschweizer commented 7 years ago

I can try this again with my new Mac if it helps you.

benjamingeer commented 7 years ago

I opened Zeitglöcklein des Lebens and got no errors. If you can reproduce this, please let me know exactly how to do it.

tobiasschweizer commented 7 years ago

Ok, I will give it another try and let you know.

tobiasschweizer commented 7 years ago

Ok, I tested it again locally and often I get one thumbnail request that fails.

Steps to reproduce:

Results: Often I get one 401, sometimes two:

401-sipi-knora

Sipi console output:

Request to Knora failed: Error #1078: Could not connect socket! Bad file descriptor

Sipi log:

[2016-11-25 10:52:49.117] [shttps-logger] [error] Internal server error: 
SHTTPS-ERROR at [/Users/tobi/gitreps/Sipi_github/src/SipiImage.cpp: #257] 
Description: Could not read file "./images/knora/7T5skmiwnF7-CNbuuBSG0AR.jpg"!

[2016-11-25 10:52:49.383] [shttps-logger] [warning] Error shutdown socket! Reason: Socket is not connected
[2016-11-25 10:53:28.090] [shttps-logger] [error] Internal server error: 
SHTTPS-ERROR at [/Users/tobi/gitreps/Sipi_github/src/SipiImage.cpp: #257] 
Description: Could not read file "./images/knora/IZSeUBoKQza-CQn491o14x8.jpg"!

[2016-11-25 10:53:28.090] [shttps-logger] [error] Internal server error: 
SHTTPS-ERROR at [/Users/tobi/gitreps/Sipi_github/src/SipiImage.cpp: #257] 
Description: Could not read file "./images/knora/6q9Rdt3fcbu-CRTf2n6jKSq.jpg"!

[2016-11-25 10:54:32.454] [shttps-logger] [warning] Error shutdown socket! Reason: Socket is not connected

The logs made me suspicious. Is the connection problem a result of a Sipi Error in SipiImage? Or are these messages unrelated (the first two timestamps are quite close).

@lrosenth What do you think? I am on develop and up to date with the current version.

lrosenth commented 7 years ago

Hast Du eine Ahnung, weshalb SIPI das File nicht findet??

Lukas --

Prof. Dr. Lukas Rosenthaler Head of "Digital Humanities Lab", University of Basel Bernoullistrasse 32 CH-4056 Basel, Switzerland

Tel: + 41 61 267 0490 Fax: +41 61 267 3855 http://www.dhlab.unibas.ch

On 25 Nov 2016, at 11:03, Tobias Schweizer notifications@github.com wrote:

Ok, I tested it again locally and often I get one thumbnail request that fails.

Steps to reproduce:

clear your browser's cache each time before you load a book login to Salsah and open a book with about 400 pages observe the browser's network console check Sipi's console output Results: Often I get one 401, sometimes two: https://cloud.githubusercontent.com/assets/6000023/20621238/e8368ce8-b2fd-11e6-8de8-ce45b8c9b040.png Sipi console output:

Request to Knora failed: Error #1078: Could not connect socket! Bad file descriptor Sipi log:

[2016-11-25 10:52:49.117] [shttps-logger] [error] Internal server error: SHTTPS-ERROR at [/Users/tobi/gitreps/Sipi_github/src/SipiImage.cpp: #257] Description: Could not read file "./images/knora/7T5skmiwnF7-CNbuuBSG0AR.jpg"!

[2016-11-25 10:52:49.383] [shttps-logger] [warning] Error shutdown socket! Reason: Socket is not connected [2016-11-25 10:53:28.090] [shttps-logger] [error] Internal server error: SHTTPS-ERROR at [/Users/tobi/gitreps/Sipi_github/src/SipiImage.cpp: #257] Description: Could not read file "./images/knora/IZSeUBoKQza-CQn491o14x8.jpg"!

[2016-11-25 10:53:28.090] [shttps-logger] [error] Internal server error: SHTTPS-ERROR at [/Users/tobi/gitreps/Sipi_github/src/SipiImage.cpp: #257] Description: Could not read file "./images/knora/6q9Rdt3fcbu-CRTf2n6jKSq.jpg"!

[2016-11-25 10:54:32.454] [shttps-logger] [warning] Error shutdown socket! Reason: Socket is not connected The logs made me suspicious. Is the connection problem a result of a Sipi Error in SipiImage?

@lrosenth https://github.com/lrosenth What do you think? I am on develop and up to date with the current version.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhlab-basel/Knora/issues/298#issuecomment-262921856, or mute the thread https://github.com/notifications/unsubscribe-auth/AFN9zDq1CYlUoZQwedvDHb7Pnz4nR5nbks5rBrKMgaJpZM4KjhYE.

tobiasschweizer commented 7 years ago

The error seems to occur with different files (randomly) each time.

lrosenth commented 7 years ago

Same problem we had before??

--

Prof. Dr. Lukas Rosenthaler Head of "Digital Humanities Lab", University of Basel Bernoullistrasse 32 CH-4056 Basel, Switzerland

Tel: + 41 61 267 0490 Fax: +41 61 267 3855 http://www.dhlab.unibas.ch

On 25 Nov 2016, at 11:26, Tobias Schweizer notifications@github.com wrote:

The error seems to occur with different files (randomly) each time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhlab-basel/Knora/issues/298#issuecomment-262926308, or mute the thread https://github.com/notifications/unsubscribe-auth/AFN9zBP45-GBfDaJ0wvUgc9jtZgWduCiks5rBrfcgaJpZM4KjhYE.

tobiasschweizer commented 7 years ago

I am not sure if the two problems here are related or not. But yes, we had a similar issue before when reading jpeg files.

benjamingeer commented 7 years ago

After trying many times, I got the 401 Unauthorized error in the browser, along with this error in the Sipi log:

Request to Knora failed: Error #1186: Couldn't read HTTP header: Connection dropped by peer!

Also, once I got a Bad file descriptor in the Sipi log, but with no error in the browser.

I haven't been able to reproduce any of the Sipi error messages @tobiasschweizer got.

benjamingeer commented 7 years ago

I tried hammering Knora's Sipi authorisation route with ab, and didn't get any errors:

iml-wall-e:knora benjamingeer$ ab -n 100000 -c 100 http://localhost:3333/v1/files/incunabula_0000003328.jp2
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests

Server Software:        akka-http/2.4.11
Server Hostname:        localhost
Server Port:            3333

Document Path:          /v1/files/incunabula_0000003328.jp2
Document Length:        258 bytes

Concurrency Level:      100
Time taken for tests:   276.018 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      41200000 bytes
HTML transferred:       25800000 bytes
Requests per second:    362.30 [#/sec] (mean)
Time per request:       276.018 [ms] (mean)
Time per request:       2.760 [ms] (mean, across all concurrent requests)
Transfer rate:          145.77 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    3   7.2      1     143
Processing:     7  272 246.3    193   14044
Waiting:        1  269 246.5    190   14044
Total:          7  275 246.1    195   14045

Percentage of the requests served within a certain time (ms)
  50%    195
  66%    239
  75%    278
  80%    301
  90%    444
  95%    975
  98%   1085
  99%   1149
 100%  14045 (longest request)

So I wonder if this is actually a Sipi bug.

benjamingeer commented 7 years ago

With more than 130 concurrent connections, I always get Connection reset by peer from Akka. I've asked them about it: https://github.com/akka/akka/issues/21908

benjamingeer commented 7 years ago

https://groups.google.com/forum/#!topic/akka-user/o-jU4C51b3E

benjamingeer commented 7 years ago

We should try wrk instead of ab:

https://github.com/wg/wrk

lrosenth commented 7 years ago

On wich system are You doing the tests? Osx or linux??

Von meinem iPhone gesendet

Am 29.11.2016 um 16:03 schrieb Benjamin Geer notifications@github.com<mailto:notifications@github.com>:

I tried hammering Knora's Sipi authorisation route with ab, and didn't get any errors:

iml-wall-e:knora benjamingeer$ ab -n 100000 -c 100 http://localhost:3333/v1/files/incunabula_0000003328.jp2 This is ApacheBench, Version 2.3 <$Revision: 1706008 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient) Completed 10000 requests Completed 20000 requests Completed 30000 requests Completed 40000 requests Completed 50000 requests Completed 60000 requests Completed 70000 requests Completed 80000 requests Completed 90000 requests Completed 100000 requests Finished 100000 requests

Server Software: akka-http/2.4.11 Server Hostname: localhost Server Port: 3333

Document Path: /v1/files/incunabula_0000003328.jp2 Document Length: 258 bytes

Concurrency Level: 100 Time taken for tests: 276.018 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 41200000 bytes HTML transferred: 25800000 bytes Requests per second: 362.30 [#/sec] (mean) Time per request: 276.018 [ms] (mean) Time per request: 2.760 [ms] (mean, across all concurrent requests) Transfer rate: 145.77 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 3 7.2 1 143 Processing: 7 272 246.3 193 14044 Waiting: 1 269 246.5 190 14044 Total: 7 275 246.1 195 14045

Percentage of the requests served within a certain time (ms) 50% 195 66% 239 75% 278 80% 301 90% 444 95% 975 98% 1085 99% 1149 100% 14045 (longest request)

So I wonder if this is actually a Sipi bug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dhlab-basel/Knora/issues/298#issuecomment-263593998, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFN9zP9wyxMEwa0raCvvBkeHK1-T7oPnks5rDD7LgaJpZM4KjhYE.

benjamingeer commented 7 years ago

On OS X. I have two hypotheses about what might be happening:

  1. Sipi drops a connection and incorrectly says that Knora dropped it.
  2. Sipi makes more than 130 concurrent requests to Knora, and Knora really drops a connection.

If it's (2), I think we need a way to limit the number of concurrent requests that Sipi makes to Knora.

lrosenth commented 7 years ago

Ee have to look

Von meinem iPhone gesendet

hat might be happening:

  1. Sipi drops a connection and incorrectly says that Knora dropped it.

Noooo ! ;-) we have to check. I assume that the connection from Sipi to knora cones from the lua

  1. Sipi makes more than 130 concurrent requests to Knora, and Knora really drops a connection.

I limit the number of parallel requests. So I don't think... Could it be the limit of open file-id's on os x??

Interesting! We have to dig... Soon arriving in Basel from Laussnne,

Lukas

If it's (2), I think we need a way to limit the number of concurrent requests that Sipi makes to Knora.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dhlab-basel/Knora/issues/298#issuecomment-263663628, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFN9zO97b1Zc6ZfaUBF3NV0agwIBxmuNks5rDHWAgaJpZM4KjhYE.

benjamingeer commented 7 years ago

The apparent limit of 130 concurrent connections on OS X seems to be a kernel configuration issue, not an Akka issue. Following the advice here and here seems to work. We need to see if this has any effect on the Sipi-Knora interaction problem.

benjamingeer commented 7 years ago

After adjusting the Akka HTTP configuration and the Mac OS kernel configuration, I can no longer reproduce this problem on Mac OS X. @tobiasschweizer, could you try pull request #338? Make sure to run the script to adjust your Mac OS X kernel configuration.

benjamingeer commented 7 years ago

With the configuration in #338, Knora handles 800 concurrent requests to the /v1/files route without errors on my machine, using GraphDB. More than that, and GraphDB starts refusing connections. (There might be some relevant GraphDB configuration to do.)

benjamingeer commented 7 years ago

Since #338 doesn't seem to have solved this, perhaps it's a Sipi issue after all.

benjamingeer commented 7 years ago

https://github.com/dhlab-basel/Sipi/issues/84