Add a sperate timeout parameter to getStatusChange()

dandrader commented 1 year ago

The timeout parameter of SCardGetStatusChange() and SCardCancel() usage are not equivalent in terms of resulting behavior. This creates room for behavioural differences and bugs when moving PC/SC C code to be run on top of the Web API.

Here, for instance, a timeout of zero is passed: https://github.com/OpenSC/OpenSC/blob/master/src/libopensc/reader-pcsc.c#L375 This cannot be implemented with SCardCancel.

When SCardCancel is called very close to a SCardGetStatusChange, there's no guarantee it will work and it depends on the PC/SC implementation. Even if the SCardGetStatusChange is processed before the SCardCancel, the service might still be setting up some internal state for SCardGetStatusChange and therefore that cancel might not work at that early moment.

Thus a getStatusChange() with a very short AbortSignal.timeout(ms) (eg: a couple of milliseconds) would most likely not get cancelled at all.

Preview | Diff

reillyeon commented 1 year ago

If I understand correctly there are two issues here:

There is a need for a "get card status without blocking" option which is what passing a timeout of zero accomplishes.
Canceling an operation can fail, leaving the operation still pending.

It does seem like passing an actual timeout time rather than calling SCardCancel is nicer for the backend implementation, as that timeout can be passed through to other systems as well and overall set expectations for what the caller wants.

I'm a little concerned about (2) however because an AbortSignal can only be signaled once but this implies that applications may be expected to retry calls to SCardCancel. Is that the case?

reillyeon commented 1 year ago

CC @domenic for the AbortSignal design feedback. It would be nice if an algorithm could determine whether an AbortSignal had a timeout associated with it. In that case I might suggest exchanging the timeout parameter for something more purpose-specific like immediate: true to indicate that the developer doesn't want the call to wait for any events, which is semantically different from specifying a timeout.

dandrader commented 1 year ago

There is a need for a "get card status without blocking" option which is what passing a timeout of zero accomplishes.

To accomplish this you have to set the dwCurrentState member of the SCARD_READERSTATE struct to SCARD_STATE_UNAWARE, which means "The application is unaware of the current state, and would like to know. The use of this value results in an immediate return from state transition monitoring services.".

Passing a 0 timeout in this case instead of, say, INFINITE, can cause ambiguity since that function may return either SCARD_S_SUCCESS or SCARD_E_TIMEOUT.

But since I see code in the wild also passing 0 as the timeout, it must be possible to express this usage with the Web API.

Canceling an operation can fail, leaving the operation still pending.

If it's sent very shortly after the getStatusCall request, like 1 ms, it fails. And the extra hops (mojo and also extension event in the ChromeOS case) taken until a request coming from the renderer reaches the PC/SC deamon doesn't help.

I'm a little concerned about (2) however because an AbortSignal can only be signaled once but this implies that applications may be expected to retry calls to SCardCancel. Is that the case?

Good question. I don't know if any application sends multiple SCardCancel due to the same blocked SCardGetStatusChange or if it's retried if they notice that the SCardGetStatusChange still didn't unblock.

dandrader commented 1 year ago

So what I feared does take place. I had a better look at OpenSC (as an example of a real PC/SC application), and it does call SCardCancel independently of any outstanding SCardGetStatusChange call, in general "clean up" operations (1, 2).

Therefore this must be possible with the Web API as well and I think we have no choice but to expose a cancel() method in SmartCardContext and remove the AbortSignal parameter from the getStatusChange API.

emaxx-google commented 1 year ago

I can add that, while unusual, there does exist software that makes multiple SCardCancel PC/SC calls with the same parameters in a row. I observed this in logs from some smart card middleware a few years ago (in that case it was a series of 4 successive SCardCancel calls that are followed by an SCardReleaseContext call).

That middleware or similar applications are still possibly doing this in the real world. Presumably, the reason they're doing this is exactly to mitigate the possible race condition between the "one thread entered sleep in SCardGetStatusChange" and the "other thread's SCardCancel call started to be processed" events.

reillyeon commented 1 year ago

Passing a 0 timeout in this case instead of, say, INFINITE, can cause ambiguity since that function may return either SCARD_S_SUCCESS or SCARD_E_TIMEOUT.

But since I see code in the wild also passing 0 as the timeout, it must be possible to express this usage with the Web API.

Given how OpenSC is using it I think it would be reasonable for callers to translate a 0 timeout into something like immediate: true because that's semantically what the application is looking for. In the algorithm for getStatusChange() we would add a special case for a 0 timeout precisely because we don't want the naive implementation which checks to see if more than now + timeout milliseconds have passed.

Now the question is whether it makes more sense to have immediate and signal flags or timeout and signal flags. The latter expresses the semantics more clearly while the latter gives the implementation more information about when the operation is likely to be cancelled (unless we get the ability to introspect a signal to see if it has an associated timeout).

I'm still curious for @domenic's perspective here but I don't have any opposition to either option.

So what I feared does take place. I had a better look at OpenSC (as an example of a real PC/SC application), and it does call SCardCancel independently of any outstanding SCardGetStatusChange call, in general "clean up" operations (1, 2).

Therefore this must be possible with the Web API as well and I think we have no choice but to expose a cancel() method in SmartCardContext and remove the AbortSignal parameter from the getStatusChange API.

If we can guarantee that signaling the AbortSignal is good enough to guarantee that the getStatusChange() operation is cancelled then we don't need to change the API. Just like how for transactions we're opinionated about how SCardEndTransaction() is called we can design the API so that SCardCancel() can only be called once (or really, AbortController.abort() can be called as many times as you'd like, it's just a no-op after the first call). We just need to be sure that we don't have the same bug that PC/SC implementations have which allows cancelling an operation to fail.

domenic commented 1 year ago

CC @domenic for the AbortSignal design feedback. It would be nice if an algorithm could determine whether an AbortSignal had a timeout associated with it.

That feels wrong to me. The purpose of AbortSignal.timeout() is to tap into the generic infrastructure, not to create a special-purpose path that only superficially looks like it's using the general AbortSignal infrastructure.

dandrader commented 1 year ago

If we can guarantee that signaling the AbortSignal is good enough to guarantee that the getStatusChange() operation is cancelled then we don't need to change the API. Just like how for transactions we're opinionated about how SCardEndTransaction() is called we can design the API so that SCardCancel() can only be called once (or really, AbortController.abort() can be called as many times as you'd like, it's just a no-op after the first call). We just need to be sure that we don't have the same bug that PC/SC implementations have which allows cancelling an operation to fail.

The end result of sticking with an AbortSignal-based API is that the responsibility of potentially issuing some extra SCardCancel call to ensure any outstanding SCardGetStatusChange is unblocked will shift from the application to the Web API implementation (browser). It might be that these extra SCardCancel we see here and there in PC/SC apps are unnecessary (NOOPS). But we cannot guarantee the actual behavior of the platform's PC/SC stack. Thus the burden is on us (browser impl).

On the other hand, the browser will need to have its own clean up measures anyway, to cover the case of a web app that is suddently closed in the middle of its getStatusChange() use.

Now the question is whether it makes more sense to have immediate and signal flags or timeout and signal flags. The latter expresses the semantics more clearly while the latter gives the implementation more information about when the operation is likely to be cancelled (unless we get the ability to introspect a signal to see if it has an associated timeout).

The latter (timeout + abortsignal) does make the use case of translating winscard.h C calls to the Web API easier to implement and guarantees equivalent behavior.

Will update this pull request for the "timeout + signal" API.

dandrader commented 1 year ago

Another data point: got feedback from a partner (about retrying SCardCancel): "This is not something I've ever seen in application code that uses the WinSCard API"

reillyeon commented 1 year ago

CC @domenic for the AbortSignal design feedback. It would be nice if an algorithm could determine whether an AbortSignal had a timeout associated with it.

That feels wrong to me. The purpose of AbortSignal.timeout() is to tap into the generic infrastructure, not to create a special-purpose path that only superficially looks like it's using the general AbortSignal infrastructure.

I agree that passing AbortSignal.timeout(0) and inferring special behavior from it would be an improper use of the generic infrastructure, which is why I suggested an immediate: true option instead. What I'm thinking about here is that sometimes the underlying platform operation understands the concept of a timeout and I wonder whether it is better from an implementation to pass through a hint about a timeout that is part of the AbortSignal chain or to always tell the underlying platform that there is no timeout and always abort the operation manually.

If we can guarantee that signaling the AbortSignal is good enough to guarantee that the getStatusChange() operation is cancelled then we don't need to change the API. Just like how for transactions we're opinionated about how SCardEndTransaction() is called we can design the API so that SCardCancel() can only be called once (or really, AbortController.abort() can be called as many times as you'd like, it's just a no-op after the first call). We just need to be sure that we don't have the same bug that PC/SC implementations have which allows cancelling an operation to fail.

The end result of sticking with an AbortSignal-based API is that the responsibility of potentially issuing some extra SCardCancel call to ensure any outstanding SCardGetStatusChange is unblocked will shift from the application to the Web API implementation (browser). It might be that these extra SCardCancel we see here and there in PC/SC apps are unnecessary (NOOPS). But we cannot guarantee the actual behavior of the platform's PC/SC stack. Thus the burden is on us (browser impl).

On the other hand, the browser will need to have its own clean up measures anyway, to cover the case of a web app that is suddently closed in the middle of its getStatusChange() use.

That is my thought as well. We already have to be sure we can really cancel an operation so it makes sense for us to take responsibility for that and provide a better interface to applications.

Now the question is whether it makes more sense to have immediate and signal flags or timeout and signal flags. The latter expresses the semantics more clearly while the latter gives the implementation more information about when the operation is likely to be cancelled (unless we get the ability to introspect a signal to see if it has an associated timeout).

(I realize that I said "the latter" twice. For the first instance I meant "the former".)

The latter (timeout + abortsignal) does make the use case of translating winscard.h C calls to the Web API easier to implement and guarantees equivalent behavior.

That sounds fine to me. I sketched out examples using both versions and there doesn't seem like a clear readability win.

WICG / web-smart-card

Add a sperate timeout parameter to getStatusChange() #10