MartijnR commented 1 year ago

https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

MartijnR commented 1 year ago

Are the individual CPU cores used at/near 100% when failure occurs?

(if so, multithreading code won't be a solution)

jkeremian commented 1 year ago

Some stats on testing archival casebook on my local environment .

server: formPrint-dev

My local mac-pro laptop shows 6 cores:

Number of the thread I used 4, it took around 9 min to complete job, max CPU usage 55%

number of the thread I used 6, it took around 9 min to complete job, max CPU usage 55%

number of the thread I used 12, it took around 9 min to complete job, max CPU usage 55%

Regardless of the number of the thread used, the total time taken were the almost the same as well as CPU percentage.

From the CPU chart, at the start of the job, CPU usage goes max close to 55% , then comes down and averages between (27 -35) %

jkeremian commented 1 year ago

Looks like formPrint-dev server has only 2 cores.

MartijnR commented 1 year ago

Thanks. The remote formprint-dev server only serves the XForms to your local app(s), right?

MartijnR commented 1 year ago

large number of PDF requests sent (called threads above)

My own benchmarking with Centro running and using production mode (npm start), running sh tools/benchmark-headless-pdf.sh which sends 12 requests simultaneously ON A MAC. I ran the tests 3 times.

8 cores - all used by Enketo (cluster workers) (`"max processes": 16`).

It took 21.648 seconds.
It took 21.806 seconds.
It took 22.909 seconds.

See three spikes below in Core 1,3,5,7 below (and barely used Core 2, 4, 6, and 8).

Green represents CPU utilization by user applications, red represents CPU utilization by Mac OS X itself, and blue indicates low-priority tasks.

4 cores set to be used by Enketo (`"max processes": 4`)

It took 22.016 seconds.
It took 22.043 seconds.
It took 20.652 seconds.

Note that the same 4 cores are used as in the previous set of tests!

1 core set to be used by Enketo (`"max processes": 1`)

It took 21.021 seconds.
It took 20.473 seconds.
It took 21.814 seconds.

Findings

Clearly I am not understanding something about number of processes using clusters as the results are exactly the same with 8, 4 and 1 cluster worker (= enketo express process). Maybe something specific to using headless browsers.
4 CPUs are used 100% and 4 not used at all, which may indicate we can only get a 100% improvement, so 11 seconds is our goal.

MartijnR commented 1 year ago

I am less confident the multi-threading approach will work here. We may not be able to run puppeteer in a worker_thread (am getting Illegal invocation error ) but will try more later.

Also found this that may possibly be interesting: https://github.com/thomasdondorf/puppeteer-cluster

MartijnR commented 1 year ago

[x] run tests with old PDF code that launches browser for each request
[x] Run on linux server, because the 4 core thing might be a Mac-specific thing
[x] ~~check out the puppeteer-cluster library? (max concurrency)~~ no need, it seems

MartijnR commented 1 year ago

Old PDF Code

CPU core usage is similar to the previous tests.

8 cores - all used by Enketo (cluster workers) (`"max processes": 16`).

It took 28.170 seconds.
It took 26.171 seconds.
It took 23.776 seconds.

1 core set to be used by Enketo ("max processes": 1)

It took 29.548 seconds.
It took 23.964 seconds.
It took 23.400 seconds.

Findings

no change in CPU usage (on a Mac)
slightly slower, in particular when configuring Enketo to use more processes

MartijnR commented 1 year ago

Linux server findings:

all cores are used 100% (tested with 2 CPU and 8 CPU server - DigitalOcean dedicated "CPU-Optimized" droplet)
the "max processes" setting in Enketo has no effect
similar behavior with a complex form with lots of logic and a simple widgets form
increasing the headless timeout configuration setting in Enketo can avoid timeouts (though there always is some max limit of number of requests you can send without timing out of course).

MartijnR commented 1 year ago

script used (and a variant using a record, sent to .../instance/view/pdf instead):

#!/bin/sh
TIMEFORMAT='It took %R seconds.'

time {
    for i in {1..12}
        do
            curl --user enketorules: -d "server_url=http://localhost:3000&form_id=LabsReconciliation&ecid=a" http://localhost:8005/oc/api/v1/survey/view/pdf &
            pids[${i}]=$!
        done

    for pid in ${pids[*]}
        do
            wait $pid
        done
}

jkeremian commented 1 year ago

https://github.com/OpenClinica/enketo-express-oc/assets/7663291/ad5b2c54-355c-47ed-9165-dfce4e38fe72

I'm using 4 threads/requests in this test.

MartijnR commented 11 months ago

@MartijnR what is the error code you receive if you send more requests than Enketo can handle?

MartijnR commented 11 months ago

what is the error code you receive if you send more requests than Enketo can handle?

I'm getting a HTTP status code 408 (Request Timeout) response with body:

 {
    "message": "PDF generation failed: Navigation timeout of 60000 ms exceeded"
}

OpenClinica / enketo-oc

Multi-threading headless API responses? #33

8 cores - all used by Enketo (cluster workers) (`"max processes": 16`).

4 cores set to be used by Enketo (`"max processes": 4`)

1 core set to be used by Enketo (`"max processes": 1`)

Findings

Old PDF Code

8 cores - all used by Enketo (cluster workers) (`"max processes": 16`).

1 core set to be used by Enketo ("max processes": 1)

Findings

Linux server findings:

OpenClinica / enketo-oc

Multi-threading headless API responses? #33

8 cores - all used by Enketo (cluster workers) ("max processes": 16).

4 cores set to be used by Enketo ("max processes": 4)

1 core set to be used by Enketo ("max processes": 1)

Findings

Old PDF Code

8 cores - all used by Enketo (cluster workers) ("max processes": 16).

1 core set to be used by Enketo ("max processes": 1)

Findings

Linux server findings:

8 cores - all used by Enketo (cluster workers) (`"max processes": 16`).

4 cores set to be used by Enketo (`"max processes": 4`)

1 core set to be used by Enketo (`"max processes": 1`)

8 cores - all used by Enketo (cluster workers) (`"max processes": 16`).