OpenClinica / enketo-oc

OpenClinica's fork of the Enketo web forms monorepo
Apache License 2.0
0 stars 1 forks source link

Multi-threading headless API responses? #33

Open MartijnR opened 1 year ago

MartijnR commented 1 year ago

https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

MartijnR commented 1 year ago

Are the individual CPU cores used at/near 100% when failure occurs?

(if so, multithreading code won't be a solution)

jkeremian commented 1 year ago

Some stats on testing archival casebook on my local environment .

server: formPrint-dev

My local mac-pro laptop shows 6 cores:

Number of the thread I used 4, it took around 9 min to complete job, max CPU usage 55%

number of the thread I used 6, it took around 9 min to complete job, max CPU usage 55%

number of the thread I used 12, it took around 9 min to complete job, max CPU usage 55%

Regardless of the number of the thread used, the total time taken were the almost the same as well as CPU percentage.

From the CPU chart, at the start of the job, CPU usage goes max close to 55% , then comes down and averages between (27 -35) %

Screenshot 2023-08-22 at 5 18 05 PM
jkeremian commented 1 year ago

Looks like formPrint-dev server has only 2 cores.

Screenshot 2023-08-23 at 11 49 16 AM
MartijnR commented 1 year ago

Thanks. The remote formprint-dev server only serves the XForms to your local app(s), right?

MartijnR commented 1 year ago

large number of PDF requests sent (called threads above)

My own benchmarking with Centro running and using production mode (npm start), running sh tools/benchmark-headless-pdf.sh which sends 12 requests simultaneously ON A MAC. I ran the tests 3 times.

8 cores - all used by Enketo (cluster workers) ("max processes": 16).

It took 21.648 seconds.
It took 21.806 seconds.
It took 22.909 seconds.

See three spikes below in Core 1,3,5,7 below (and barely used Core 2, 4, 6, and 8).

Screenshot 2023-08-23 at 4 15 20 PM

Green represents CPU utilization by user applications, red represents CPU utilization by Mac OS X itself, and blue indicates low-priority tasks.

4 cores set to be used by Enketo ("max processes": 4)

It took 22.016 seconds.
It took 22.043 seconds.
It took 20.652 seconds.
Screenshot 2023-08-23 at 4 20 19 PM

Note that the same 4 cores are used as in the previous set of tests!

1 core set to be used by Enketo ("max processes": 1)

It took 21.021 seconds.
It took 20.473 seconds.
It took 21.814 seconds.
Screenshot 2023-08-23 at 4 39 19 PM

Findings

  1. Clearly I am not understanding something about number of processes using clusters as the results are exactly the same with 8, 4 and 1 cluster worker (= enketo express process). Maybe something specific to using headless browsers.
  2. 4 CPUs are used 100% and 4 not used at all, which may indicate we can only get a 100% improvement, so 11 seconds is our goal.
MartijnR commented 1 year ago

I am less confident the multi-threading approach will work here. We may not be able to run puppeteer in a worker_thread (am getting Illegal invocation error ) but will try more later.

Also found this that may possibly be interesting: https://github.com/thomasdondorf/puppeteer-cluster

MartijnR commented 1 year ago
MartijnR commented 1 year ago

Old PDF Code

CPU core usage is similar to the previous tests.

8 cores - all used by Enketo (cluster workers) ("max processes": 16).

It took 28.170 seconds.
It took 26.171 seconds.
It took 23.776 seconds.

1 core set to be used by Enketo ("max processes": 1)

It took 29.548 seconds.
It took 23.964 seconds.
It took 23.400 seconds.

Findings

MartijnR commented 1 year ago

Linux server findings:

MartijnR commented 1 year ago

script used (and a variant using a record, sent to .../instance/view/pdf instead):

#!/bin/sh
TIMEFORMAT='It took %R seconds.'

time {
    for i in {1..12}
        do
            curl --user enketorules: -d "server_url=http://localhost:3000&form_id=LabsReconciliation&ecid=a" http://localhost:8005/oc/api/v1/survey/view/pdf &
            pids[${i}]=$!
        done

    for pid in ${pids[*]}
        do
            wait $pid
        done
}
jkeremian commented 1 year ago

https://github.com/OpenClinica/enketo-express-oc/assets/7663291/ad5b2c54-355c-47ed-9165-dfce4e38fe72

I'm using 4 threads/requests in this test.

MartijnR commented 11 months ago

@MartijnR what is the error code you receive if you send more requests than Enketo can handle?

MartijnR commented 11 months ago

what is the error code you receive if you send more requests than Enketo can handle?

I'm getting a HTTP status code 408 (Request Timeout) response with body:

 {
    "message": "PDF generation failed: Navigation timeout of 60000 ms exceeded"
}