bcgov / bc-wallet-mobile

BC Wallet to hold Verifiable Credentials
Apache License 2.0
62 stars 49 forks source link

Unable to get person credential #2226

Open jleach opened 1 month ago

jleach commented 1 month ago

User reported unable to get person credential. Issue seems to be consistent for him but intermittent for others. On backgrounding / foregrounding the app message trickle in one at a time.

There may be two different issues:

  1. Some times the wallet hangs and no messages (basic or offer) are received. Restarting the wallet fixes this and both messages arrive.
  2. Some times the offer is received but the wallet does not automatically accept the offer.

1. No Messages

2. Hang

  1. This issue can be reproduced by restarting the app and get a person credential.
  2. Then I delete it, and repeat the process. This time I pause on a BCSC screen until you see the following three lines appear in the mediator logs (1.0.20 may only print one line so repetition may be irrelevant):
2024-10-18 21:10:18,830 aries_cloudagent.transport.inbound.ws ERROR Unexpected Websocket message type received: WSMsgType.CLOSED: None, None
2024-10-18 21:10:18,831 aries_cloudagent.transport.inbound.ws ERROR Unexpected Websocket message type received: WSMsgType.CLOSED: None, None
2024-10-18 21:10:18,831 aries_cloudagent.transport.inbound.ws ERROR Unexpected Websocket message type received: WSMsgType.CLOSED: None, None
  1. Complete the process.

You will see this in the mediator logs, at which point the BC Wallet app will be stalled at "Just a moment....". For both 1.0.20 and 1.0.21 it repeats 3x.

2024-10-18 21:10:25,945 aries_cloudagent.core.conductor WARNING Cannot queue message for delivery, no supported transport
2024-10-18 21:10:26,166 aries_cloudagent.core.conductor WARNING Cannot queue message for delivery, no supported transport
2024-10-18 21:10:26,529 aries_cloudagent.core.conductor WARNING Cannot queue message for delivery, no supported transport
cvarjao commented 1 month ago

While we can do work on the server/mediator side to increase stability/availability, the wallet should not assume that a websocket connection is long lived. The connection can be interrupted at anytime by any reason, but the wallet should be resilient and try to re-connect.

jleach commented 2 weeks ago

Created some test mechanics here. Unable to reproduce outside of the mobile environment (on a mobile device).

jleach commented 2 weeks ago

Current Findings:

Next Steps:

jleach commented 1 week ago

There is no real substance to implicit message pickup. It is just a matter of having a long lived connection (WebSocket) and sending a trust ping. Assuming then the mediator is smart enough to figure out it has N messages queued up and it will send them all, in the proper order, to the client (wallet):

From MediationRecipientApi.ts L302

// Implicit means sending ping once and keeping connection open. This requires a long-lived transport
// such as WebSockets to work

Looking through other code there are only some mechanics to deal with opening / closing the websocket. Nothing specific to starting pickup or anything else related to message pickup.

Either all messages are not being sent or something is getting jammed up with rxjs event processing.

Also noticed this error is ok, it results from Credo not being able to find the previously deleted credential. May not be a problem but the exception may be jamming up the get person logic.

 LOG  WARN :
{
  "message": "Failed to process revocation notification message",
  "data": {}
}

Next Steps:

jleach commented 1 week ago
 LOG  DEBUG :
{
  "message": "Successfully connected to WebSocket wss://ws.us-east.public.mediator.indiciotech.io/ws"
}
jleach commented 1 week ago

Important You need an existing un-revoked Person credential in the wallet to trigger this issue. The reason is, when the app comes into the foreground a trust ping is set which will trigger a message to be delivered. The first message is usually the revocation message which does nothing to advance the workflow. If you don't have a Person credential, the first credential is the offer which allows the workflow to work as expected.

jleach commented 1 week ago

In an Implicit connection a trust ping is sent to start the pickup strategy. When a trust ping is sent it triggers the mediator to deliver one message.

This issue can be kludged by wrapping the trust ping logic when starting mediation in a setInterval(async () => { ... }, timeoutInMs). This causes the trust ping to be sent every timeoutInMs which will in tern cause the mediator to deliver exactly 1 message until it's queue is drained.

jleach commented 1 week ago

TLDR; The "Implicit" message pickup strategy is working as expected which is why this issue can be reproduced across different technologies. There are limitations of the implicit pickup mechanism and Pickup v2 is recommend or implementing similar solutions to ensure reliable message delivery in ACA-Py agent systems.

Problems with Implicit Pickup:

Alternatives and Solutions:

jleach commented 1 week ago

I tried other message pickup strategies, both seem to produce similar results as Implicit. Credo does not seem to want to go into any other mode by default/

mediatorPickupStrategy: MediatorPickupStrategy.PickUpV2,
maximumMessagePickup: 10,

This triggers polling, but this error message is repated in the mediator logs:

mediator-1  | 2024-11-13 18:59:32,569 aries_cloudagent.core.dispatcher ERROR Message parsing failed: 
Unrecognized message type https://didcomm.org/messagepickup/2.0/status-request, sending problem report

and

mediatorPickupStrategy: MediatorPickupStrategy.PickUpV2LiveMode,
maximumMessagePickup: 10,

If I force it to use PickUpV2LiveMode I see the error in the wallet logs:

 LOG  WARN :
{
  "message": "Error initiating message pickup with mediator 841b4ac2-e769-4758-b5db-45b8b2df1a05",
  "data": {}
}
 LOG  ERROR :
{
  "message": "Error resuming agent message pickup, TimeoutError: Timeout has occurred"
}

And the same error in the mediator logs:

mediator-1  | 2024-11-13 19:02:43,858 aries_cloudagent.core.dispatcher ERROR Message parsing failed:
Unrecognized message type https://didcomm.org/messagepickup/2.0/status-request, sending problem report
jleach commented 1 week ago

The status request works with the Indecio cloud mediator:

{
  "message": "Received message with type 'https://didcomm.org/messagepickup/2.0/status', recipient key z6MkgbMs9wXEhWi6UUevkJ6NSfcykotYd71AN31V1VCpjnch and sender key z6MkgSYBM63iHNeiT2VSQu7bbtXhGYCQrPJ8uEGurbfGbbgE",
  "data": {}
}

I've confirmed that MediatorPickupStrategy.Implicit on the Indecio cloud mediator reproduces the same behaviour as ACA-py. However, MediatorPickupStrategy.PickUpV2LiveMode works as expected without polling. The strategy MediatorPickupStrategy.PickUpV2 works as expected but with polling.

NOTE: There is probably an issue in Credo where it always starts / tries implicit pickup first, but message retrieval can easily be restarted in PickUpV2LiveMode.