Sandbox classic and Sandbox next handle empty transactions differently

cocreature commented 3 years ago

Consider the following DAML Model:

template T
  with
    p : Party
  where
    signatory p
    nonconsuming choice C : ()
      controller p
      do pure ()

together with this REPL session (it doesn’t matter if you use the REPL or anything else)

daml> p <- allocatePartyWithHint "p" (PartyIdHint "p")
daml> cid <- submit p (createCmd (T p))
daml> submit p (exerciseCmd cid C)

If you now look at the transaction stream, you will see two transactions for Sandbox classic:

{
  "transactions": [
    {
      "transactionId": "3",
      "commandId": "2890353f-9788-4b4b-984e-5ace58e30ff2",
      "effectiveAt": "2020-08-03T19:38:37.989539Z",
      "events": [
        {
          "created": {
            "eventId": "#3:0",
            "contractId": "#3:0",
            "templateId": {
              "packageId": "ed52b012e14eba3719e0edd43e0db33e5140ce4fa6c084de844b90001c2d387c",
              "moduleName": "Main",
              "entityName": "T"
            },
            "createArguments": {
              "fields": [
                {
                  "value": {
                    "party": "p"
                  }
                }
              ]
            },
            "witnessParties": [
              "p"
            ],
            "agreementText": "",
            "signatories": [
              "p"
            ]
          }
        }
      ],
      "offset": "0000000000000003"
    }
  ]
}
{
  "transactions": [
    {
      "transactionId": "4",
      "commandId": "b94d4442-09c3-4116-9e91-e8b3c18556c3",
      "effectiveAt": "2020-08-03T19:38:52.187258Z",
      "offset": "0000000000000004"
    }
  ]
}

But for sandbox next, you get only one

{
  "transactions": [
    {
      "transactionId": "0A2432366565653138662D396637352D336663312D393334612D323766336539346637396164",
      "commandId": "8bf38d6e-ce70-4354-826b-8bee12e84162",
      "effectiveAt": "2020-08-03T19:35:28.286117Z",
      "events": [
        {
          "created": {
            "eventId": "#0A2432366565653138662D396637352D336663312D393334612D323766336539346637396164:0",
            "contractId": "00aa688090465067ffe871ed9e439befd56ad303e7059b10cc2f59f733d783d722",
            "templateId": {
              "packageId": "ed52b012e14eba3719e0edd43e0db33e5140ce4fa6c084de844b90001c2d387c",
              "moduleName": "Main",
              "entityName": "T"
            },
            "createArguments": {
              "fields": [
                {
                  "value": {
                    "party": "p"
                  }
                }
              ]
            },
            "witnessParties": [
              "p"
            ],
            "agreementText": "",
            "signatories": [
              "p"
            ]
          }
        }
      ],
      "offset": "00000000000000040000000000000000"
    }
  ]
}

The completion stream contains both commands in both cases and it also contains a transactionId in both cases.

I would expect to get the empty transaction like I do in Sandbox classic.

cocreature commented 3 years ago

This is not just limited to transactions that don’t have any creates and archives. The problem extends to transactions that do have creates and archives but after applying the transaction filter, they are empty. Consider the following example:

template T
  with
    p : Party
  where
    signatory p
    nonconsuming choice C : ()
      controller p
      do pure ()
    nonconsuming choice C2 : ContractId T2
      controller p
      do create (T2 p)

template T2
  with
    p : Party
  where
    signatory p

Together with the REPL session

daml> p <- allocatePartyWithHint "p" (PartyIdHint "p")
daml> cid <- submit p (createCmd (T p))
daml> submit p (exerciseCmd cid C2)
<contract-id>

On Sandbox next, if you filter the transaction stream for T you will see a single transaction for the first command. if you filter for T2 you will only see the second command.

On Sandbox classic, you always see both but one of them is empty.

ghost commented 3 years ago

Sandbox Next uses kvutils under the hood. My guess is that the issue is in this underlying library.

cocreature commented 3 years ago

Yeah seems very likely.

gerolf-da commented 3 years ago

@cocreature: I would say the behavior of sandbox classic doesn't make sense. The empty transaction shouldn't be returned on the flat transaction stream. What is curious though is why there is a difference though between sandbox and sandbox classic. the api server serving the data should be the same.

cocreature commented 3 years ago

@gerolf-da triggers currently rely on the behavior of sandbox-classic and I think we need to support this usecase in some form (this doesn’t necessarily mean keeping the behavior). Let me explain why and how:

Triggers keep a pending set and track a subset of the ACS (filtered by user-specified template types).
If the trigger submits a command (via the asynchronous command submission service), we add something to the pending set until the command has been successfully applied.
There are two options for when we should consider a command to be successfully applied:
1. We got the completion for the command.
2. We got the corresponding transaction.

For an empty transaction, the completion is perfectly adequate. However, consider a non-empty transaction: In that case it only makes sense to remove something from the pending set once you have updated the ACS, i.e., applied the transaction. If you remove it once you got the completion before you got the transaction event (there is no ordering between whether you get the transaction or the completion first afaik), you now have removed it from the pending set but the condition that produced the command (i.e. the ACS) hasn’t changed yet so you’ll submit it again defeating the whole point of having the pending set.

Therefore, triggers currently always use 2. This of course only works if there is actually a transaction which seems to always be the case on sandbox-classic. The empty transaction doesn’t actually change the ACS of course so in this case it would also be fine to use the completion. However, as far as I can tell, the trigger runner has no way to tell if the transaction will be empty.

In summary the usecase I would like to see supported (and which is supported by sandbox-classic) is the following:

Submit a command using the asynchronous command submission service.
Wait for the event that provides you with the state change caused by that command.

If the completion event would tell me that I would not get a transaction if it is empty, that is of course also perfectly fine.

gerolf-da commented 3 years ago

I see, that makes sense. Thanks.

What I can't wrap my head around is that the api server comes up with a different result, considering that it is independent of the backing ledger implementation. We'll need to look at the actual transactions and check if they are different.

gerolf-da commented 3 years ago

@cocreature: Do you know whether you see the same behavior in Sandbox Classic with both in-memory mode and the SQL backend (e.g. with H2)?

cocreature commented 3 years ago

@gerolf-da I only tested in-memory so not sure. I can test sql but probably not before next week.

gerolf-da commented 3 years ago

@cocreature: In that case, don't worry about it. We'll check that as part of the analysis. Thanks for reporting it.

meiersi-da commented 2 years ago

Observations

None of the RPCs in the com.ledger.api.v1.TransactionService specifies whether transactions without a single event matching the filter should be included or not.
The implementation of these RPC on top of the IndexDB works by querying for all create and exercise events matching the given filter, and considers a transaction to be present if and only if there is at least one event matching the filter. The transaction metadata (e.g., workflow_id or ledger_effective_time) is read off the first matching event, which works as the transaction metadata is stored in a denormalized form with each event. See e.g. the participant_events_create table.
The filters for "flat" transactions work based on stakeholders, while the filters for transaction trees work based on the informees of an event. None of the above filters consider the submitters of the command when matching events.
In the current implementation, a flat transaction or a transaction tree can be retrieved with requesting_parties = P if and only if that transaction is present in the corresponding transaction stream filtered to events visible to parties P.
The offsets used in the completion stream and the offsets used in the transaction streams stem from the same domain. So, so the transaction for a completion at offset off will have offset off in the transaction stream.

Options for solving @cocreature 's use-case:

In summary the usecase I would like to see supported (and which is supported by sandbox-classic) is the following:

Submit a command using the asynchronous command submission service.

Wait for the event that provides you with the state change caused by that command.

Show empty transactions for matching completions: adjust the definition of the transaction streams such that an empty transaction is shown whenever the filtering parties could see the completion for this transaction.
Include response-offset in transaction streams: include a string response_offset field in the GetTransaction(Trees)Response that signals up to which offset all matching transactions have been returned by the stream. Responses with no matching transactions are sent whenever the ledger end changes, but no new matching transactions where added. Note that this does not disclose more information than is already available via the GetLedgerEnd RPC. The script runner can thus compare the completion offset to the response_offset to determine whether it has seen all relevant effects for a completed command.

Sidenote: the completion stream already contains a response_offset in the form of a Checkpoint.

Recommendation

Note that there are many ways how the filters applied to the results of the different RPCs can be defined. The current choices might not be great, but are consistent and support a performant implementation. Note also that the code serving the transaction streams is driven by changes to the ledger end offset, and can thus easily add the response_offset.

I (@meiersi-da) would thus recommend the following changes to solve @cocreature 's use-case:

Add the response_offset field as described above, including a test for the case of no-matching transactions.
Document the fact that the GetTransaction(Tree)ById RPCs only consider transactions visible in the corresponding streams to make it less surprising.

@cocreature @gerolf-da what do you think?

cocreature commented 2 years ago

Given that there are no heartbeats (afaik) on the transaction stream, does that mean that I have to wait until I get the next transaction which can potentially be a very long time depending on activities on the ledger? That seems somewhat problematic to me.

I’m also not clear on how your suggestion would affect SubmitAndWaitForTransaction on the command service which currently also fails on this.

meiersi-da commented 2 years ago

No you would not have to wait. Whenever you can see a completion at offset off1 you'll also get a new response on the transaction stream with an offset off2 >= off1. See

Responses with no matching transactions are sent whenever the ledger end changes, but no new matching transactions where added.

As for SubmitAndWaitForTransaction, that is a different problem than the Trigger-runner, but needs solving as well. Thanks for pointing it out.

There, I'd recommend that we change the specification for GetTransactionById such that NOT_FOUND is only returned if GetTransactionTreeById were to return NOT_FOUND; i.e., if none of the requesting_parties is an informee on an event of the transaction. I expect this to be sufficient as the requesting_parties are set equal to the actAs parties, and at least one of these must be able to see the root event for the command it submitted.

I recommend to explicitly not incorporate the submitters into the filtering conditions, as we are reconsidering how submissions are authorized and tracked.

cocreature commented 2 years ago

Ah missed that sentence, sounds good to me then.

As for SubmitAndWaitForTransaction, that is a different problem than the Trigger-runner, but needs solving as well. Thanks for pointing it out.

I think it makes sense to design a solution together even if the implementation might be decoupled. The trigger runner is effectively implementing submitAndWaitForTransaction on top of the asynchronous command service + completion + transaction stream. Ideally that would behave comparable to using the synchronous command service.

meiersi-da commented 2 years ago

Ah missed that sentence, sounds good to me then.

:+1:

As for SubmitAndWaitForTransaction, that is a different problem than the Trigger-runner, but needs solving as well. Thanks for pointing it out.

I think it makes sense to design a solution together even if the implementation might be decoupled. The trigger runner is effectively implementing submitAndWaitForTransaction on top of the asynchronous command service + completion + transaction stream. Ideally that would behave comparable to using the synchronous command service.

Yes, ideally they would work the same while exhibiting good performance characteristics. The problems we have are:

The current definitions of the services around transactions mush together read access control and filtering (as a performance optimization).
The IndexDB structure does not represent the transaction metadata explicitly.

I expect us to build services with cleaner definitions as part of revisiting how package and party and user/application-id management is done on the Ledger API, which we plan to do soon. Until then, I'd like us to aim to find solutions that work for the use-cases we currently have without requiring expensive IndexDB schema changes or lowering performance of the tx-streams.

I believe the two changes for GetTransactionById and for GetTransaction(Tree)s described above satisfy these criteria. Do you agree @gerolf-da and @cocreature ?

cocreature commented 2 years ago

Yes I agree :+1:

gerolf-da commented 2 years ago

@meiersi-da: regarding option 2, would you limit the number of "empty" responses the ledger api server should send out? On a sufficiently busy ledger It would produce $DISPATCHER_POLLING_FREQUENCY (currently 10) number of empty responses on the transaction stream.

meiersi-da commented 2 years ago

@gerolf-da : we could at the cost of adding extra latency for the trigger runner in case there is no transaction. I believe with the current "separate-streams" API design, we are forced to just send the empty responses.

Question: is sending these actually a problem?

gerolf-da commented 2 years ago

No, it's not a problem. We should go with it.

meiersi-da commented 2 years ago

Thank @gerolf-da . As per our discussion today, the next step will be on you to synchronize with @mziolekda .

nmarton-da commented 2 years ago

@cocreature the machinery behind submitAndWait is listening to completion stream, and if the corresponding completion is identified it looks the TX up with a point-wise query (that would beGetTransactionById, GetFlatTransactionById on the API) Would this be a suitable usage for the problem here? :crossed_fingers:

cocreature commented 2 years ago

@nmarton-da let me double check I understand what you are suggesting:

Currently both the transaction stream and pointwise transaction queries fail for empty transactions.

You are suggesting to fix only the latter. Applications like triggers would then only listen to the completion stream and get the transactions by doing pointwise transaction queries for each completion.

Is that correct?

If so, I think this can work but it seems fairly inefficient based on my understanding. I thought tailing transaction streams is/will be (with recent changes like llp) will be more efficient than doing lots of pointwise queries.

nmarton-da commented 2 years ago

If so, I think this can work but it seems fairly inefficient based on my understanding. I thought tailing transaction streams is/will be (with recent changes like llp) will be more efficient than doing lots of pointwise queries.

Pointwise queries are targeted to be added to In-Memory Fan-Out, since, well, it is used by submitAndWait machinery. After that it will be very efficient.

cocreature commented 2 years ago

If you’re confident that the performance will be reasonable, I think this can work. It feels a bit odd to have to resort to only the completion stream + pointwise queries instead of subscribing to the completion & transaction stream but I don’t feel super strongly about this.

cocreature commented 2 years ago

@remyhaemmerle-da fyi since you own triggers these days: This would mean you would have to only subscribe to the completion stream and whenever you get a successful completion, query by transaction id to get the corresponding transaction.

nmarton-da commented 2 years ago

We had a very fruitful discussion on this topic with @meiersi-da here I try to summarize the conclusions:

TLDR

The solution I proposed above (completion+point-wise lookup) has some limitation, and it needs to be evaluated whether it is usable for your use case. Please read below for more info. @remyhaemmerle-da @moritzkiefer-da
We could kick off work for a new streaming endpoint "flat-transaction with completion" (more details below), which most probably would be a good solution for this problem. Here priories need to be aligned. @moritzkiefer-da @mziolekda @meiersi-da
"Unified Ledger API streams" would also solve the problem here, but AFAIK this wont happen any time soon.
There is a good chance there are bugs in the current implementation (even that originally the empty transaction concept worked and got broken in time, which would explain the difference between sandbox and sandbox-next in the original problem statement). @mziolekda here we could have user impact as well.

Details:

On the suggested using the point-wise lookup and completion

This approach suffers from multiple drawbacks:

The point-wise lookup does not have TransactionFilter, only wildcard-party filter, therefore it might require client side filtering
If a client wants to consume both changes triggered by self (completions) and both changes triggered by third parties (needs another flat transaction stream), and wants to have a precise understanding of it's version of the state with the validAt offset, then with a completion being the last event which is not present in the other flat transaction stream, it still cannot compute a validAt (e.g. until other transaction stream emits something with offset equal or greater than the last completion, it cannot know if the completion is the last one) It is not known to me what is the precise use-case for the Triggers, and whether this approach still can be used there or not.

On a desirable solution for the general problem

As we can see from the previous example (the completion+pointwiselookups andother flat stream use case) the completion+pointwise lookups usability is limited, and cannot be considered as recommended/general approach for the problem.
The above suggested solution with flat tx streams sending empty transactions regularly to circumvent the synchronization issues above, still puts lots of involved synchronization related logic to the client, and under certain conditions can be very verbose (even unnecessary)

The ideal solution for the problem would be to provide the clients a unified stream with completions and flat transactions, which pushes all related events for both in one streams in order. So server side merging of streams. AFAIK there is already some conceptual work started to define such for Ledger API (not sure whether with the same name).

Until that work realizes, we could consider to start an initiative/concept/design of Flat Transactions with Completion streaming endpoint, which would be a special case of a totally general unified stream approach. Roughly: we could add a new subscription with requests for both flat and completion streams, and stream the merged (and ordered) stream of those to the client.

A simplified, low API impact of the above would be to simply re-implementing flat-tx by merging completion and flat transactions streams, and emitting empty transactions for all completions without corresponding flat-tx. Since this seemed the original requirement for the flat tx stream, I am guessing this is still correct from the API point of view, but this needs to be verified.

Strange heuristics implemented (bugs?) currently in Ledger API code trying to fulfill partially the problem with empty transactions

For flat transaction stream coming from DB (meaning not from in-memory fan-out), current implementation emits an empty transaction if all visible (for the TransactionFilter) events in the transaction turns out to be transients (both create and archive for the same contract in one transaction), and the transaction was sent from the same participant, and the submitters have intersection with the union of the parties from the TransactionFilter.
For flat transaction stream coming from in-memory fan-out: it seems to not emitting empty transactions, but references in code point to this might be a bug, and similar behavior is wanted probably (not entirely certain, probably more investigation is needed)

The code for the DB version I was able to track down until here: https://github.com/digital-asset/daml/commit/6b851229b8a0e3cac3bf43b20ac947de0cc41aaf which after I've gave up. On any rate this behavior originates from old times (commit has date 2020.03.05), I could not find any documentation describing this behavior on the Ledger API. I have a hunch previously all events were returned, and this was not only constrained for full transient transactions, but all transactions, which would be very similar what is needed here. And accidentally that behavior changed as DB retrieval got more efficient and only visible events were returned. More investigation needed, and unfortunately clients might be already have logic on top of this (empty transactions for certain full-transient transactions), since it is there for at least 2 years.

If someone is reading this far and has an idea why the empty transaction for an otherwise full-transient transaction makes sense, please let us know!

mziolekda commented 2 years ago

Thank you @nmarton-da for great analysis and for checking all corner cases. Is the behavior of the transactionTree streams consistent, or does it also have lots of oddities depending on who is asking and if the stream is coming from buffers or the DB?

cocreature commented 2 years ago

Great summary Marton, thanks!

A few comments:

Triggers do need transactions submitted by other parties and they do use template filtering. So you are completely right that just the completion stream + pointwise queries would not work.
I think you could technically make it work by subscribing to transaction stream, completion stream and doing pointwise queries for completions because we only care about the empty transactions for our own submissions. However, that’s a convoluted on the client side, we’ll fetch transactions we submitted twice (and once unfiltered) and just overall a pretty bad ux.
As for your question on the current emitted empty transactions, it sounds like a broken attempt at fixing the issue discussed here. I don’t think the current behavior there makes sense.
In terms of priorities, I’d say this is overall fairly low in priority at least for me (especially compared to things like interface subscriptions, multi-domain, … which are blockers for cn). We’ve known about this issue for 1.5 years and while every few months someone runs into it, I wouldn’t say it’s a big issue in practice. So I’d be inclined to say, we just accept this atm and go for the full unified event stream rather than trying to add a stream targeted at solving this specific issue.

nmarton-da commented 2 years ago

Current implementation for flat transactions:

if TX killed by filter, then not returning anything (no empty TX neither).
if TX only having transient pairs, and the union of all parties in TransactionFilter has intersection with the TX's submitters then returning an empty transaction.

Suggestion by @mziolekda @cocreature @nmarton-da

keeping the current behavior for persistence queries
fixing IMFO so it behaves the same as persistence
NOT changing Ledger API, so this feature is not explicit
Fixing the original issue with Triggers: this will wait for the unified stream project to have a proper implementation

digital-asset / daml