Closed mti-takayama-t closed 1 year ago
@mti-takayama-t Thank you very much for submitting this. I'll test and review before the end of the week. ian
@mti-takayama-t I've tested this and am about to merge it. Thank you for submitting.
What I've noticed with this fix is that, when a refresh is due, the call to awake()
will cause the Lambda to take longer to execute: awake()
effectively waits on the main thread for the refresh to take place on the refresh agent thread. It was never intended for this pause to be visible to clients of the Lambda, but I don't think there's any way around it. The Lambda execution times are typically <20 ms, and in low throughput scenarios there is therefore a good chance that the Lambda context will be asleep whenever a scheduled refresh should next take place. I think we have to take the hit, as per your fix, to ensure that refreshing occurs when the Lambda is active.
Because the Lambda execution times are longer whenever a refresh occurs, the max duration of the Lambda has now gone up, from approx 100ms to around 700ms. The average, though, remains below 20ms. That max duration does have an impact however: because the CFN template I supply restricts the concurrency of the Lambda function (to 1, by default) , I've noticed, again in reasonably high throughput scenarios, that the Lambda is heavily throttled, as many clients attempt to get info from the Lambda around the same time. That's not necessarily an issue, though, because this throttling is masked in the client, where calls to the Lambda take place in the background.
Overall, then, I think we trade better freshness for some increased latency. But this increase in latency should generally not be visible to the client.
I actually think awake()
should also be called by any 'application' Lambda functions that use the Neptune Gremlin Client and a refresh agent (I do a lot of testing with Lambda functions that call Neptune). The same problem will affect these Lambda functions as affects the neptune-endpoints-info-lambda
: that is, refreshes may sometimes be scheduled for times when the Lambda context is asleep. I'm going to update the docs with this suggestion.
Thanks again for the PR.
ian
@iansrobinson Thank you very much for your close testing and inspection. I agree with your discussion about the increased latency and the conclusion of its tradeoff.
We are highly grateful to you for taking the time to merge the PR.
Issue
We experienced a long time lag for ClusterEndpointsRefreshAgent in our app to recognize a newly added neptune instance via
lambdaProxy
.For example, after patching
neptune-endpoints-info-lambda
:, invoking the lambda with the default
pollingIntervalSeconds
(15 seconds) at the rate of 1 request every 15 seconds (+ about 2 seconds of CLI overhead) from one client resulted in the following refresh events:The intervals of the events seem somewhat random, but about 2-13 minutes, which is far longer than the default
pollingIntervalSeconds
of 15 seconds.This PR tries to improve the behavior above.
TestCases
1:
pollingIntervalSeconds
is 15 seconds by default2:
pollingIntervalSeconds
is 15 seconds by default3:
pollingIntervalSeconds
is 15 seconds by default4:
pollingIntervalSeconds
is 15 seconds by defaultEnvironment
(but rebased the PR onto main HEAD following v2.0.1)
(compiling with JDK 1.8 resulted in an error of git-commit-id-maven-plugin:
io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision (get-the-git-infos) on project gremlin-client: Execution get-the-git-infos of goal io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision failed: Unable to load the mojo 'revision' in the plugin 'io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0' due to an API incompatibility: org.codehaus.plexus.component.repository.exception.ComponentLookupException: pl/project13/maven/git/GitCommitIdMojo has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
)We would appreciate it if you could review, confirm, and even retest this PR.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.