bame-da commented 2 years ago

I confirm that, if this is a bug that has security implications, I already contacted security@digitalasset.com and followed the responsible disclosure policy.

I confirm that this is not a question or a request for technical support by the community, for which the Daml forum is available.

Affected Daml version

1.18.0

Bug description

The getCommandsInFlight function in the trigger rule returns commands for which a successful completion was already seen in the trigger update.

Specifically, I have this code in my automation (blinded for clarity)

type SettlementState = (_, _ , Map.Map CommandId Completion)

settlementInit : TriggerInitializeA SettlementState
settlementInit = return (..., ..., Map.empty)

settlementUpdate : Message -> TriggerUpdateA SettlementState ()
settlementUpdate m = do
  case m of
    MCompletion c -> do
      (..., ..., completions) <- get
      case ... of
        ... -> put (..., ..., Map.insert c.commandId c completions)
    _ -> return () 

settlementRule : Party -> TriggerA SettlementState ()
settlementRule p = do
  ...
  commandsInFlight <- getCommandsInFlight
  let commandIds = Map.keys commandsInFlight
    forA_ commandIds (\commandId -> do
      case Map.lookup commandId completions of
        None -> return ()
        Some completion -> error $ "In-flight CommandId " <> show commandId <> " has already completed with completion " <> show completion
  ...

I don't think that error should ever occur, but it does, and with a successful completion:

15:20:09.019 [TriggerService-akka.actor.default-dispatcher-5] ERROR com.daml.lf.engine.trigger.Machine$ - Error: Unhandled exception: DA.Exception.GeneralError:GeneralError@86828b98{ message = "In-flight CommandId CommandId "5d294eba-2a37-4c4f-9365-373a12054ff4" has already completed with completion Completion {commandId = CommandId "5d294eba-2a37-4c4f-9365-373a12054ff4", status = Succeeded {transactionId = TransactionId "0A2436323538646563322D343766352D333030392D623561662D626532313039643662323634"}}" }

To reproduce

Repo/Branch https://github.com/DACH-NY/asset-refapp/tree/trade-sharder contains a WIP trigger for some fancy settlement logic. To reproduce, follow //SettlementDemo/Main/Readme.md, but instead of ./start_sequential.sh defaultSequentialConfig.json, run ./start_partyasset.sh defaultPartyAssetConfig.json.

Expected behavior

Don't throw any errors in the section of code after -- BUG? in PartyAssetSettler.daml.

Additional context

Add here any other context about the problem here. This includes but is not limited to:

OS information
Daml code relevant to reproduce the faulty behavior
logs in textual format containing relevant information
for Navigator or the documentation: screenshots

bame-da commented 2 years ago

Or possibly getCommandsInFlight just never gets cleared?

cocreature commented 2 years ago

Triggers clear once we get the transaction corresponding to your submission not the completion. This is important because otherwise ACS changes are not yet applied and you’ll end up repeating a change.

Unfortunately, that runs in a bug in sandbox classic where some submissions do not produce a corresponding transaction (despite the completion having a transaction id). This is particularly common if you mess up your template id filters to not include the templates that would be modified in the ACS by your submission. See https://github.com/digital-asset/daml/issues/6975 for the corresponding issue.

Haven’t yet looked at your reproducer but you do have template filters which makes this a plausible explanation. As a quick fix, try switching to AllInDar and see if that fixes it. Once we fix the bug where empty transactions get dropped, we’ll start emitting a loud warning suggesting to look into your template id filters if we get an empty transaction because that makes no sense for a trigger to produce.

bame-da commented 2 years ago

Oh 💩 . My entire day's worth of bug chasing was due to the fact that I assumed the trigger ACS would be updated after processing the completion rather than the transaction. Would it make sense to open an issue to hold completions until after the transaction?

bame-da commented 2 years ago

No, I'm still preplexed how to do this right... In case it matters, I'm on Sandbox, not Sandbox classic.

When I clear my own locks based on completions, I get dirty reads:

Submit TX1 consuming cid1 : ContractId Asset with commandId C1.
Receive an MCompletion with status Success and commandId C1.
query @Asset in the rule. Get cid1 as part of results.
have a bad time.

If I try to clear my locks based on transactions (MTransaction), they don't clear at all. I don't seem to be getting any commandIds on MTransaction.

So how can I keep my own pending set?

bame-da commented 2 years ago

Using AllInDar does not fix the issue of not seeing an MTransaction or a failed MCompletion with the commandId of a submission.

bame-da commented 2 years ago

Ohhh, the issue is with Sandbox, not Sandbox Classic! The problem is indeed that I'm sending (perfectly valid) transactions that don't change the ledger at all. That's why the locks don't release when using MTransaction.

Still keeping this open to discuss

Would it make sense to open an issue to hold completions until after the transaction?

cocreature commented 2 years ago

Ohhh, the issue is with Sandbox, not Sandbox Classic! The problem is indeed that I'm sending (perfectly valid) transactions that don't change the ledger at all. That's why the locks don't release when using MTransaction.

why are you doing that? I view triggers as functions that try to get the ACS to a target state. Sending transactions that don’t change the ACS doesn’t make sense in that model, you’ll just keep sending the same transaction because you never get closer to your target state.

Would it make sense to open an issue to hold completions until after the transaction?

Could you expand on how that would help? If I understand you correctly you are not suggesting to change the commands in flight just the completion message. My first instinct is that we should not mess with ledger API message ordering. The fact that you can get completions before transactions is not something specific to triggers.

bame-da commented 2 years ago

why are you doing that?

I basically have a helper contract with a nonconsuming choice RunBatch which takes a bunch of lists of work to do. My trigger prepares that work and then calls RunBatch. Sometimes there is no work to do so RunBatch got called with empty lists in which case it's a noop.

There are obvious solutions to this: Either don't call RunBatch for an empty batch, which is what I implemented to fix the issue, or use a consuming choice.

Could you expand on how that would help? My first instinct is that we should not mess with ledger API message ordering.

I see a UX problem in how the streaming API (the update function) and the state/query API (the rule function) interact in Triggers. As a general rule, I would expect that after seeing a command/transaction TX commit at offset T on API X, the next time I query API X in the recommended way, the effect of TX is incorporated in the query.

This is trivially true for the gRPC API: After I see a completion of TX and I call ACS + transaction stream up to offset T, I see the effect of TX. This is also true for the JSON API. If I see any sign of the transaction on websockets, or get a response for my submit call, the next time I query, the effect of TX is included.

But this isn't true in Triggers. I can see a completion in update, but am then later presented with a queriable state in rule, which the change has not yet been incorporated in.

It's possible to work around by keeping custom state based on Transaction events and failed completions, but this caught me out big time for a simple use case like maintaining my own kind of "pending set".

cocreature commented 2 years ago

This is trivially true for the gRPC API: After I see a completion of TX and I call ACS + transaction stream up to offset T, I see the effect of TX. This is also true for the JSON API. If I see any sign of the transaction on websockets, or get a response for my submit call, the next time I query, the effect of TX is included.

I’m not sure I agree with that. For the gRPC API, I’d recommend that you keep a long running transaction stream & completion stream open instead of streaming transactions between offsets. And at point you run into the same issue you have for triggers.

If I understand you correctly, working with transactions as successful completions instead of using the successful completions directly would solve your usecase at little to no increase in complexity (e.g. no need to keep state that you wouldn’t otherwise have to keep)? (Ignoring the issue around empty transactions missing which is a bug that will be fixed on the ledger API side).

So it’s not that you cannot do what you want or that it’s too complex but that it’s confusing? I think that’s a fair point but storing messages and reordering them doesn’t seem like a great solution to that issue. I’ll have a think on whether I can come up with something else. If not, clearer docs might help a bit at least.

digital-asset / daml

[BUG] Triggers show already completed commands as in-flight #12233

Affected Daml version

Bug description

To reproduce

Expected behavior

Additional context