IBMStreams / endpoint-monitor

Nginx reverse proxy sample application to Streams REST operators.
Apache License 2.0
2 stars 4 forks source link

Signature check has side effect to performance #65

Open rnostream opened 4 years ago

rnostream commented 4 years ago

An active signature check in actual implementation has side effect of request performance.

js_set $signatureSecret readSecret;

This set's the readSecret function as handler being called for EACH!!! request accessing the nginx variable $signatureSecret.

It doesn't set a value already in this line!

The handler is called when the nginx variable is used in an nginx 'location' AND!!! also in the JS code via r.variables.signatureSecret. This is by design of this nginx extension, accessing a JS_SET nginx variable during processing of request will call the handler with the request as argument and returns e.g. a string which is used in 'location' config or JS code. It's a somehow strange implementation as nothing points to this fact as var secret_key = r.variables.signatureSecret; looks as normal JS assignment, but it isn't.

I suppose that the intention was a one-time-only call to the readSecret function to read the secret from only once, isn't it? So another way of reading the signature secret only once should be used (may be using SET and assigning the secret value as string).

ddebrunner commented 4 years ago

I think @Karsten12 did some testing on this and the actual file is only being read once.

@rnostream Do you have any performance numbers that show this?

rnostream commented 4 years ago

During discovering the EM code to get familiar with it's functionality in detail last week the question came up when readSecret function is called. This leads to the js_set and the JS nginxs module. It's documentation tells that the functions being set during the different provided JS actions are handlers being called with actual request. All samples I found used the nginx variable set with js_set in a 'location' section where the string returned from the handler (generated for the actual request) is used. So there was a little uncertainty from my side and I put code into readSecret and checkhttp when they are called. And I discovered that that readSecret is called just by the fact accessing r.variables.signatureSecret. A really strange behavior from JS perspective (it's just an assignment) but works exactly as designed "resolve the variable when being used by calling the handler with the actual request".

So for EM this means that for every signature check the signature file is read.

Did trying to understand the code only because signature check didn't work and no documentation was found.

ddebrunner commented 4 years ago

@Karsten12 Can you comment what you discovered, I thought the file was only being read once due to caching by the underlying javascript library.

Karsten12 commented 4 years ago

I do believe @rnostream is correct, although I'm not sure how to verify it until we get an instance of EM with more disk utilization. My initial testing used the unix stat command, and some print statements inside the read signature function to monitor the last time the file was being read. After many tests request I found that although for each query the print statement was invoked (ie the function was being called per request), the stat command showed no change. After further research, while the default option for the NJS function readFileSync is to read from cache, it appears this is OS level cache, not Nginx, hence why with my minimal use/testing the signature file was still in the cache.

Karsten12 commented 4 years ago

1 possible solution is to modify the _has_signature_secret in app.py so that if the signature-secret file is present, create a new .conf file in /var/opt/streams-endpoint-monitor/job-configs/*.conf or another location that is included in the main nginx conf's server block. The file will contain the line set $signatureSecret <value_of_secret>; , and we can remove the readSecret function. Its just a thought, can't test it until I can get my local cluster up and running

ddebrunner commented 4 years ago

Before we jump into any changes, it would be good to have a set of tests to understand what the performance impact actually is.

rnostream commented 4 years ago

OK, we may let it as it is at moment. But just for getting a clear flow and better understanding it should be changed later, w/o the need for someone to know the details on JS_SET and possible caching.

If this would be really used as high load ingest scenarios to Streams it may have impact. Even on the background that networking inside OpenShift may have significant impact on network performance (virtual networking over the SDN's causing overhead for each packet by passing several additional entities compared to bare metal Streams).