Open jleach opened 1 month ago
While we can do work on the server/mediator side to increase stability/availability, the wallet should not assume that a websocket connection is long lived. The connection can be interrupted at anytime by any reason, but the wallet should be resilient and try to re-connect.
Created some test mechanics here. Unable to reproduce outside of the mobile environment (on a mobile device).
Current Findings:
Message is undeliverable to connection b535c73e-3706-42b7-98d9-aa919d2f39b6 (BC Wallet Service (Dev))
is caused by starting message pickup when the WebSocket is closed. Perhaps but good to know.Next Steps:
There is no real substance to implicit message pickup. It is just a matter of having a long lived connection (WebSocket) and sending a trust ping. Assuming then the mediator is smart enough to figure out it has N messages queued up and it will send them all, in the proper order, to the client (wallet):
From MediationRecipientApi.ts L302
// Implicit means sending ping once and keeping connection open. This requires a long-lived transport
// such as WebSockets to work
Looking through other code there are only some mechanics to deal with opening / closing the websocket. Nothing specific to starting pickup or anything else related to message pickup.
Either all messages are not being sent or something is getting jammed up with rxjs
event processing.
Also noticed this error is ok, it results from Credo not being able to find the previously deleted credential. May not be a problem but the exception may be jamming up the get person logic.
LOG WARN :
{
"message": "Failed to process revocation notification message",
"data": {}
}
Next Steps:
rxjs
. If they do not show up, then perhaps the mediator is not delivering them all. Not sure how the cloud mediator would be impacted by this. LOG DEBUG :
{
"message": "Successfully connected to WebSocket wss://ws.us-east.public.mediator.indiciotech.io/ws"
}
Important You need an existing un-revoked Person credential in the wallet to trigger this issue. The reason is, when the app comes into the foreground a trust ping is set which will trigger a message to be delivered. The first message is usually the revocation message which does nothing to advance the workflow. If you don't have a Person credential, the first credential is the offer which allows the workflow to work as expected.
In an Implicit connection a trust ping is sent to start the pickup strategy. When a trust ping is sent it triggers the mediator to deliver one message.
This issue can be kludged by wrapping the trust ping logic when starting mediation in a setInterval(async () => { ... }, timeoutInMs)
. This causes the trust ping to be sent every timeoutInMs
which will in tern cause the mediator to deliver exactly 1 message until it's queue is drained.
TLDR; The "Implicit" message pickup strategy is working as expected which is why this issue can be reproduced across different technologies. There are limitations of the implicit pickup mechanism and Pickup v2 is recommend or implementing similar solutions to ensure reliable message delivery in ACA-Py agent systems.
Problems with Implicit Pickup:
Alternatives and Solutions:
I tried other message pickup strategies, both seem to produce similar results as Implicit. Credo does not seem to want to go into any other mode by default/
mediatorPickupStrategy: MediatorPickupStrategy.PickUpV2,
maximumMessagePickup: 10,
This triggers polling, but this error message is repated in the mediator logs:
mediator-1 | 2024-11-13 18:59:32,569 aries_cloudagent.core.dispatcher ERROR Message parsing failed:
Unrecognized message type https://didcomm.org/messagepickup/2.0/status-request, sending problem report
and
mediatorPickupStrategy: MediatorPickupStrategy.PickUpV2LiveMode,
maximumMessagePickup: 10,
If I force it to use PickUpV2LiveMode
I see the error in the wallet logs:
LOG WARN :
{
"message": "Error initiating message pickup with mediator 841b4ac2-e769-4758-b5db-45b8b2df1a05",
"data": {}
}
LOG ERROR :
{
"message": "Error resuming agent message pickup, TimeoutError: Timeout has occurred"
}
And the same error in the mediator logs:
mediator-1 | 2024-11-13 19:02:43,858 aries_cloudagent.core.dispatcher ERROR Message parsing failed:
Unrecognized message type https://didcomm.org/messagepickup/2.0/status-request, sending problem report
The status request works with the Indecio cloud mediator:
{
"message": "Received message with type 'https://didcomm.org/messagepickup/2.0/status', recipient key z6MkgbMs9wXEhWi6UUevkJ6NSfcykotYd71AN31V1VCpjnch and sender key z6MkgSYBM63iHNeiT2VSQu7bbtXhGYCQrPJ8uEGurbfGbbgE",
"data": {}
}
I've confirmed that MediatorPickupStrategy.Implicit
on the Indecio cloud mediator reproduces the same behaviour as ACA-py. However, MediatorPickupStrategy.PickUpV2LiveMode
works as expected without polling. The strategy MediatorPickupStrategy.PickUpV2
works as expected but with polling.
NOTE: There is probably an issue in Credo where it always starts / tries implicit pickup first, but message retrieval can easily be restarted in PickUpV2LiveMode
.
User reported unable to get person credential. Issue seems to be consistent for him but intermittent for others. On backgrounding / foregrounding the app message trickle in one at a time.
There may be two different issues:
1. No Messages
mediator.dev.animo.id
and that worked no better.2. Hang
You will see this in the mediator logs, at which point the BC Wallet app will be stalled at "Just a moment....". For both 1.0.20 and 1.0.21 it repeats 3x.