awslabs / aws-c-http

C99 implementation of the HTTP/1.1 and HTTP/2 specifications
Apache License 2.0
136 stars 42 forks source link

Use promises in elasticurl #439

Closed evanmiller closed 6 months ago

evanmiller commented 1 year ago

Issue #, if available:

Description of changes:

promise.h neatly encapsulates what the mutex and condition variable are doing in the elasticurl code. I found the code to be more succinct and comprehensible by using it.

Tested locally with the success and failure code paths.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

graebm commented 1 year ago

Solid change, but promise.h hasn't been used in any public code yet. I was actually considering removing it from the codebase. Less code is best code, its unit test occasionally fail, and since it only offers a blocking API, it's not useful to building our non-blocking libraries. It's only handy in unit test and samples like this.

Sorry for being a jerk. I'm going to let this PR hang in limbo until the fate of promise.h is decided (this month, likely). We'll remove it, fix it, or replace it with something similar.

evanmiller commented 1 year ago

Thank you for the swift attention to my patch.

Based on my reading of this SO thread and a similar data race here, it seems that aws_promise_complete and aws_promise_fail should perform their CV notifications while still holding the mutex rather than immediately after. If there is a spurious wakeup in the main thread while the worker thread is between these two lines

https://github.com/awslabs/aws-c-common/blob/9848a8cf3979c9a820027f705ca985091885afc0/source/promise.c#L85-L86

then the main thread may proceed to destroy the condition variable before the worker can execute its notification. According to this theory, swapping those two lines of code (as well as the last two lines of aws_promise_fail) will fix the race condition.

evanmiller commented 1 year ago

As to the question of whether to keep promise.h: The race condition discovered here would also have presented itself if the original elasticurl code had attempted to clean up its condition variable. Mutexes and condition variable logic is hard to get right; as long as the library provides mutex and cvar primitives, my vote would be to keep (and fix) promises in order to prevent similar data race conditions in client code.

bretambrose commented 1 year ago

As to the question of whether to keep promise.h: The race condition discovered here would also have presented itself if the original elasticurl code had attempted to clean up its condition variable. Mutexes and condition variable logic is hard to get right; as long as the library provides mutex and cvar primitives, my vote would be to keep (and fix) promises in order to prevent similar data race conditions in client code.

Just a minor nitpick, but the current elasticurl implementation does not have a race condition. Predicate-based waits do not suffer from problems when a spurious wakeup occurs.

evanmiller commented 1 year ago

Just a minor nitpick, but the current elasticurl implementation does not have a race condition. Predicate-based waits do not suffer from problems when a spurious wakeup occurs.

I must be misunderstanding what a spurious wakeup entails. My mental model for the data race is:

Worker sets exchange_completed to true Worker unlocks mutex Main thread spuriously wakes up (returns from pthread_cond_wait) Main thread checks predicate condition – condition is true Main thread returns from aws_condition_variable_wait_pred Main thread continues Worker performs notify

In my current understanding, there no data race only because elasticurl leaks the condition variable resource rather than freeing it. I would appreciate any clarification though.

bretambrose commented 1 year ago

Just a minor nitpick, but the current elasticurl implementation does not have a race condition. Predicate-based waits do not suffer from problems when a spurious wakeup occurs.

I must be misunderstanding what a spurious wakeup entails. My mental model for the data race is:

Worker sets exchange_completed to true Worker unlocks mutex Main thread spuriously wakes up (returns from pthread_cond_wait) Main thread checks predicate condition – condition is true Main thread returns from aws_condition_variable_wait_pred Main thread continues Worker performs notify

In my current understanding, there no data race only because elasticurl leaks the condition variable resource rather than freeing it. I would appreciate any clarification though.

Invoking the library cleanup will also invoke a join function (https://github.com/awslabs/aws-c-common/blob/main/include/aws/common/thread.h#L169-L178) that does not return until all threads spawned by any of the aws-c-* libraries are fully joined.

Now if you cleaned up the condition variable before that happened, then yes, you could theoretically destroy it while it was still in use.

evanmiller commented 1 year ago

With https://github.com/awslabs/aws-c-common/pull/1023 merged, it would be great if someone could re-run the failed sanitizer test.

waahm7 commented 6 months ago

Thank you for the PR, sorry; we have decided to remove the promise.h class, so I am closing this PR as well for now.