Fix ECONNRESET errors that happens when seeding the DB

tschaffter commented 3 years ago

When working on #82 , I encounter the following issue:

I now experience ECONNRESET errors randomly. Initially after submitting a random number of persons, now also affecting other objects like Organization.

ECONNRESET (Connection reset by peer): A connection was forcibly closed by a peer. This normally results from a loss of the connection on the remote socket due to a timeout or reboot. Commonly encountered via the http and net modules.

See https://nodejs.org/api/errors.html#errors_common_system_errors

[HPM] Rewriting path from "/api/organizations?organizationId=clinical-proteomic-tumor-analysis-consortium" to "/organizations?organizationId=clinical-proteomic-tumor-analysis-consortium"
[HPM] POST /api/organizations?organizationId=clinical-proteomic-tumor-analysis-consortium ~> http://localhost:8080/api/v1
[HPM] Rewriting path from "/api/organizations?organizationId=columbia-university" to "/organizations?organizationId=columbia-university"
[HPM] POST /api/organizations?organizationId=columbia-university ~> http://localhost:8080/api/v1
[HPM] Error occurred while trying to proxy request /organizations?organizationId=cincinnati-childrens-hospital-medical-center from localhost:4200 to http://localhost:8080/api/v1 (ECONNRESET) (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Error occurred while trying to proxy request /organizations?organizationId=celgene from localhost:4200 to http://localhost:8080/api/v1 (ECONNRESET) (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Error occurred while trying to proxy request /organizations?organizationId=clinical-proteomic-tumor-analysis-consortium from localhost:4200 to http://localhost:8080/api/v1 (ECONNRESET) (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Error occurred while trying to proxy request /organizations?organizationId=center-for-research-computing from localhost:4200 to http://localhost:8080/api/v1 (ECONNRESET) (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Error occurred while trying to proxy request /organizations?organizationId=columbia-university from localhost:4200 to http://localhost:8080/api/v1 (ECONNRESET) (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Rewriting path from "/api/organizations?organizationId=consejo-superior-de-investigaciones-cientificas" to "/organizations?organizationId=consejo-superior-de-investigaciones-cientificas"
[HPM] POST /api/organizations?organizationId=consejo-superior-de-investigaciones-cientificas ~> http://localhost:8080/api/v1
[HPM] Rewriting path from "/api/organizations?organizationId=corevitas" to "/organizations?organizationId=corevitas"
[HPM] POST /api/organizations?organizationId=corevitas ~> http://localhost:8080/api/v1
[HPM] Error occurred while trying to proxy request /organizations?organizationId=consejo-superior-de-investigaciones-cientificas from localhost:4200 to http://localhost:8080/api/v1 (ECONNRESET) (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Error occurred while trying to proxy request /organizations?organizationId=corevitas from localhost:4200 to http://localhost:8080/api/v1 (ECONNRESET) (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Rewriting path from "/api/organizations?organizationId=defense-advanced-research-projects-agency" to "/organizations?organizationId=defense-advanced-research-projects-agency"
[HPM] POST /api/organizations?organizationId=defense-advanced-research-projects-agency ~> http://localhost:8080/api/v1
[HPM] Rewriting path from "/api/organizations?organizationId=corrona" to "/organizations?organizationId=corrona"
[HPM] POST /api/organizations?organizationId=corrona ~> http://localhost:8080/api/v1
[HPM] Rewriting path from "/api/organizations?organizationId=dana-farber-cancer-institute" to "/organizations?organizationId=dana-farber-cancer-institute"

tschaffter commented 3 years ago

I'm almost sure that the issue happens because we sent to many request to the API service in parallel using RxJS forkJoin. ;-)

The two screenshots below show that the first few requests - tag creation - are fine with little to no time spent in "blocked" request state (red) and most in "waiting" (blue). The second image shows that requests spent most of their time being blocked (red) and ultimately the API service initiates the connection reset issue. I'm now looking at a way to throttle the number of concurrent submission (possibly using this approach).

tschaffter commented 3 years ago

RxJS solution (client side)

I found an utility function that behaves like forkJoin and allows to specify the number of concurrent Observable to process. This solution is implemented in 0f500c1 to post 5 tags at a time. The first screenshot show that no request spend significant time in the "blocked" state. The second screenshot is obtained when settings the concurrency to 20. This leads some requests to spend time in the blocked state. Here the 7th request is the first one that spends time in the blocked state, so 6 concurrent requests would be optimal.

The results shown in this section are obtained when sending requests directly to a Flask API service (development server). See below for the results obtained using uWSGI + Flask (production server).

tschaffter commented 3 years ago

Using uWSGI (server side)

This time I decided to run the ROCC API service using Docker. This production-oriented approach leverages an uWSGI server placed in front of the Flask app to obtain better performance. The screenshot below shows that up to 20 tags can be pushed concurrently without any of them spending time in the blocked state. Note that slightly different results can be obtained each time the app is reloaded.

By default, I use processes = 1 in server/uwsgi.ini. I test with processes = 5 and definitively obtained less test instances that featured blocked requests. Ideally the value of processes should not be set larger than the number of CPU cores minus a margin to preserve other important processes (OS, MongoDB, etc.).

tschaffter commented 3 years ago

@thomasyu888 This ticket provides some insights into ECONNRESET errors that I encountered while working on the ROCC. This is the same issue that we encountered when seeding the NLP Sandbox Data Node and that we solved by introducing a delay between requests (I implemented both API services in the same way).

Here TypeScript-RxJS is a great solution for performing batch operations like seeding a databases.

Sage-Bionetworks / rocc-app

Fix ECONNRESET errors that happens when seeding the DB #83

RxJS solution (client side)

Using uWSGI (server side)