Open warner opened 9 months ago
- ERTP: replace the Purse with an NFT-specialized one that uses a MapStore ...
- requires some sort of upgrade-Purse-in-place remediation scheme to be useful for the existing KREAd contract
I think it's plausible that we can make new purses and transfer the assets, without any in-place-upgrade magic.
Purses tend to be closely held. I think the walletFactory is the only importer of any purses:
earlier discussion: https://github.com/Agoric/agoric-sdk/pull/8861#issuecomment-1925409909
https://github.com/agoric-labs/KREAd/tree/warner/8862-benchmarking has my benchmarking code in the last commit. cc @erights
ERTP: replace the Purse with an NFT-specialized one that uses a MapStore ...
- requires some sort of upgrade-Purse-in-place remediation scheme to be useful for the existing KREAd contract
I think it's plausible that we can make new purses and transfer the assets, without any in-place-upgrade magic.
Purses tend to be closely held. I think the walletFactory is the only importer of any purses:
- vbank purses are fungible and short-lived, created afresh for each offer.
- invitation purses and KREAd purses: walletFactory could voluntarily make new ones and transfer the assets.
earlier discussion: #8861 (comment)
Yeah, I think the only Purse with a non-trivial size is Zoe's internal localPooledPurse
, which holds all the Characters that are escrowed for open seats. Because this issuer is a zcfMint
, this Purse lives in vat-zoe, not vat-kread, and is never exposed outside that vat (although it's probably held in zoe's baggage too, for benefit of upgrade).
If the ERTP change is to implement a new more-efficient Kind for new Purses (noticing the AmountMath is non-fungible and using a MapStore for state.currentBalance
), then on each upgrade (and for each zcfMint issuer), Zoe could ask ERTP for a new empty Purse, get one of the latest Kind, and drain the old one into it. O(numIssuers) work on each Zoe upgrade.
A wallet user who acquires a few hundred characters (or Items; I haven't investigated them very much) could have the same problem.. I'm not sure where to stand to have the wallet decide that it should ask the zoe-based Issuer for a replacement Purse.
I'm not sure where to stand to have the wallet decide that it should ask the zoe-based Issuer for a replacement Purse.
in an upgrade of walletFactory, during the 1st crank, much like...
If there's a desire to, we could make the KREAd contract a lot more performant.
We need to make large NFT purses a lot more efficient. The case of the Zoe escrow purses is particularly concerning
Summary: ERTP's large-cardinality NFT Amount handling is O(N) expensive, and KREAd triggers it
I spent some time investigating the mainnet slowdown on tuesday 23-jan-2024, specifically the KREAd
MintCharacter
wallet action in block 13,499,082 . Thepackages/SwingSet/misc-tools/monitor-slog-block-time.py
, when fed the slogfiles of a follower (in this case my follower namedagreeable
), prints one line per block, with inter-blockTime
and some swingset data:which indicates that we had elevated block times (11s/29s/21s/29s, instead of the usual 6s), four non-empty blocks in a row, and about 25 validators fell behind (they were unable to vote in time to be counted). It took about ten blocks for them to catch up.
Slogfile Investigation
Feeding the same slogfile into my
classify-runs.py
tool (which will eventually land inpackages/SwingSet/misc-tools/
) summarizes this action as:This action required a total of 493,433,524 computrons, which exceeds our currently configured maximum of 65Mc/block, so it spilled over into multiple blocks. The action started as the 6th run (
r5
) of the first block, consumed the second and third entirely, and finished in the fourth. The four runs combined required 37.98 seconds of SwingSet computation time (on the follower from which I took this sample). The four blocks, including actions that shared the first and last blocks, looked like:I used the same tool to scan for all
kread-mint-character
actions in the chain's history, and noticed that the number of computrons is growing nearly linearly with the number of characters minted:classify-runs.py
also extracts the slogfile for the four combined runs. From this I found that the first block was ended by a 137McmintAndEscrow()
delivery to v9-zoe (crankNumc44159622
), causing the block to finish with a total of 148,328,225 computrons (i.e. it spend about 11Mc in smaller deliveries first, then the largemintAndEscrow()
finished, then therunPolicy
scheduler observed the computron limit was exceeded and ended the block, leaving the remaining work for the next one). On my follower, this delivery took 10.95 seconds.The second block was ended by
c44159626
, which was another 137McmintAndEscrow()
, taking 10.84 seconds. The third was ended byc44159676
which was a 153McexitSeat()
to v9-zoe that took 12.03 seconds.I fed these three deliveries into my
find-slowness.py
tool to identify gaps between slog events that exceed one second. This reveals bursts of computation that happen between the vat worker's receipt of the results of one syscall, and its emission of the next syscall. We can use this to deduce where acute CPU effort is being spent. It would not help us identify chronic CPU usage (small delays interspersed over a large number of syscalls), however none of these deliveries had a significant number of syscalls: they had 113/113/116 syscalls respectively.The
mintAndEscrow()
deliveries both had gaps of 2.5s, 5.0s, and 3.2s , the first of which occurred immediately after avatstoreGet()
of a VOM state record (v9o+d391/2
) that was unusually large, about 200kB. This virtual/durable object turns out to be aPurse
, of theKREAd CHARACTER
Brand, whoseAmount
contained 380 separate records. So this purse contains 380 NFTs, and our code to manipulate these sorts of balances turns out to be expensive, and also O(N) in the number of records/characters contained therein. The act of minting a character needs to interact with this large Purse several times (described below), and the elevated computron and elapsed time consumed are a result of that expense. The surprisingly expensive operations include:AmountMath
on the large balance:add
,subtract
,isGTE
toCapData
,fromCapData
) needed by liveslots to fetch/store the VOM stateBuilding a Benchmark
To investigate further, I adapted the KREAd
test-minting.js
unit test into a benchmark tool. I removed the error-case-testing portions and stripped out everything but the minting of a new character. I also increased the number ofBaseCharacters
(since eachmint
operation consumes one), and added a configurable loop to mint multiple ones. I added a bunch ofconsole.time()/timeEnd()
instrumentation, and enabled V8's profiling during the final loop, so we could get a flamegraph of most expensive pass.To get results that are closer to how XS/xsnap would work, I made some additional changes:
LOCKDOWN_OPTIONS='{"__hardenTaming__":"unsafe"}' node test/bench-minting.js
, because XS has a nativeharden()
, so we aren't unfairly penalizing V8 for lacking oneharden()
with a NOPharden()
a lot, so this might overcompensatenode_modules/@agoric/internal/src/storage-test-utils.js
to makemakeMockChainStorageRoot
pass through options tomakeFakeStorageKit
:so I could have the KREAd test
bootstrapjs
override the default ofsequence: true
:because
sequence: true
turns the O(1) chain-storageset()
calls into O(N)append()
ones. The real chain only remembers the latest value for each chain-storage cell, whereasstorage-test-util.js
was designed to support unit test cases that benefit from seeing all historical items too.Both the KREAd code and agoric-sdk utilities use non-immediately-
await
ed Promises, which makes CPU profiling more difficult: if A triggers an expensive B, but does notawait
it before continuing on to C, the flamegraph may make it look like C is the source of the expense, not A+B. So I inserted a bunch ofawait
calls which aren't strictly necessary for correctness, but helped tremendously for performance analysis. I also insertedstall()
calls (which is just a loop that doesawait null
twenty times in a row) in places where I suspected remaining overlap (code which doesn'tawait
at all, where a synchronous function fires off an async call and doesn't give the caller anything to synchronize upon).Note that liveslots flushes the VOM state-data cache at the end of every crank, to reduce GC-influenced nondeterminism. This requires us to re-load and re-unserialize the
state
data of any virtual object the first time it is referenced in any crank. In contract, thefakeLiveslots
environment never ends a crank, so it never flushes this cache, so it never needs to unserializestate
objects. This means the V8 benchmark run will lack some expensive deserialization time which actually exists on the mainnet (liveslots/XS) slog traces.Finally, the V8 run has an expensive initial call to
getRandomBaseIndex
, which is an artifact of how thefakeLiveslots
environment is built. This call produces a list of all 800-ishBaseCharacter
keys, in order, so it can select a pseudo-random one (although, really, the keys are integers, so it ought to be able to do this in O(1) without enumeration). The enumeration performs a series ofvatstoreGetNextKey()
calls. On the real chain, where swing-store is backed by SQLite, this uses a normal DB index and is fast. In the fake environment, the vatstore is backed by a plain Map (which, being JS, enumerates in insertion order), plus a side Array to track the sorted order. The act of re-sorting that side array (too much) cause a perf hit that does not represent what the real chain would do. So we ignore that initial burst of activity when looking at the data.Benchmark Results
The resulting flamegraph, for a single
mint
operation, looks like this:Eventually, I was able to correlate the long CPU times in the (V8 benchmark) flamegraph with the long gaps in the (real-chain XS) slogfile record, and deduce the sequence of events that took place. As a rough measure,
The real-chain XS operations (with 380 characters in the pooled purse) took about 95x the time needed by the V8 benchmark (with 800 characters). Compensating for the 380-vs-800, that makes the chain about 200x slower than the benchmark. This is a combination of my benchmark machine being faster than my follower, the XS engine being slower than V8 (no JIT), syscall overhead, and work done by XS that was not done by the benchmark (like deserializing the large purse on every crank).
How KREAd Works
First, some background on how the KREAd contract operates (as reverse-engineered by me; I may have it entirely wrong). It manages "Characters" and "Items", which can be owned and traded by users. Characters can also be "equipped" with Items, basically causing the item to be owned-by / associated-with the character, rather than a user. There are a limited number of Characters that can be minted (about 800 I think), of which about 380 have been minted so far. Minting a character costs a fee, which is split into "royalty" and "platform fee", and deposited into two
depositFacets
. The newly-minted character is automatically equipped with three Items. The equipped Items of each Character are held in that character's "Inventory".At startup, the KREAd contract asks ZCF to create a
zcfMint
for two denominations (KREAd CHARACTER Brand
andKREAd ITEM Brand
). This causes v9-zoe to creates an ERTP Issuer for each, with apaymentLedger
(mapping empty Payment objects to their balance), arecoverySet
for those payments, and amintRecoveryPurse
to hold the recovery set for the mint-generated Payment objects. We can use the syscalls that manipulate these tables as a way to trace the execution inside v9-zoe from just the slog events.Internally, each character's inventory is managed by a separate
inventorySeat
: a regular Zoe/ZCFSeat
. This seat is created for each character when the character is minted, held internally by the contract (in a table indexed by the character name), and never exited or deleted.Equipping an Item causes the item to be allocated to the
inventorySeat
, just like how fees are allocated to their recipients. ZCF/Zoe know about all the current allocations, but holds the assets in question in a zoe-side escrow purse until the seat is exited. Zoe creates a single sharedlocalPooledPurse
for each denomination, and merges all compatible amounts into that same purse, just like a regular bank pools all fungible customer deposits into a single vault and just uses record-keeping to track who owns what.Users are only allowed to equip items they own, to characters they own. To demonstrate this authority safely, and to work around some limitations of Zoe (no wash trades), KREAd uses an unusual pattern. When initially minting the character, it actually creates two character Amounts, which I'll call the A character and the B character (think of this like the main actor and their nearly-interchangeable stunt double, or the two buffers in a double-buffered graphics rendering pipeline). These Amounts differ in only a single field: A gets
{ keyID: 1 }
while B gets{ keyID: 2 }
. The user gets back A: it is allocated to theseat
that triggered themint()
handler, which will be exited by the contract, and then the user will callgetPayouts()
and will get a Payment for character A, which they'lldeposit
into their own Purse. But character B is allocated to the newly-createdinventorySeat
, along with the three newly-minted Items. It sits there for an indefinite time (until anequip
orunequip
action occurs).The protocol for equipping an item to a character requires the user to determine which A/B character Amount the user holds, vs which one is in the
inventorySeat
. If the user holds A, then the offer they submit looks like:The contract the tells ZCF to transfer the
inventorySeat
's charB back to the user, and to transfer charA and the item into theinventorySeat
. If the user held B, then these are reversed: the user gives character B and the item, and wants character A.So now that 380 Characters have been minted on the chain, there are 380 "A" amounts, and 380 "B" amounts. For each pair, one of the two is owned by some user purse (held only by vat-wallet), and the other is allocated to a unique
inventorySeat
(held in the character table), but owned by thelocalPooledPurse
in v9-zoe (held on behalf of the contract, to escrow all assets currently allocated to all open seats).Note: if Zoe allowed wash trades (which we considered, but decided against because it's more likely to be an error than a legitimate use), the protocol could instead be "give A+Item, want A", and KREAd wouldn't need this A-vs-B duplication. Or, if we had better notions of "use objects" (eg @erights' recent "owned amounts" work), then
equip()
would be a method on an object only available to the current owner of the character, and perhaps wouldn't need a Zoe offer to execute.Performance Issue
Anyways, the performance issue arises from these B characters and where they are held. The collection of open
inventorySeat
s hold on to 380 character records, all of which are held by a single zoe-sidelocalPooledPurse
.Zoe's escrow service was designed to be short-lived (as were seats in general: it makes sense to hold a seat open until a trade/offer is fulfilled, but they aren't meant as a long-term storage tool). And it was optimized for a moderate amount of things being escrowed. Zoe merges all escrowed assets of the same Issuer in a single
localPooledPurse
. For fungible tokens, this is an obvious performance win: adding two IST or BLD amounts is trivial, and the result requires only a single variable to track.However for non-fungible tokens, the ERTP
Amount
s arecopyBag
s. These nominally behave like a Map whose keys are character records and whose values are integers (i.e. 2 apples and 3 oranges gets you{ apple: 2, orange: 3 }
. (If the amounts are unique, it could use acopySet
, which is like acopyBag
where the values are always1
, but it wouldn't help here). TheAmountMath
code provides the same operations for these Amounts as it does with plain numbers: you canadd
two amounts,subtract
them, queryisGTE
, etc. But these operations are a lot more complex than adding/subtracting/comparing numbers: they must do Set math, where "add" is really a union, and "isGTE" is really a form of intersection. So they take significantly longer, and this expense grows with the number of items in the Amount.So when Zoe is told to escrow a newly-minted A character and allocate it to the user's
seat
, it mustadd
this value into the existinglocalPooledPurse
for Characters, which already has a B character for every Character that's ever been minted (380 on mainnet). In my benchmark run, on V8, when there are already 800 characters in this purse, it takes 70ms to perform thisadd
.Much of this time is spent sorting the entries into a stable order, and asserting that they are all of the right type, to meet the requirements of a
copyBag
(which exists because a normal Map cannot be serialized into durable storage, or used as a key in a table). There are a tremendous number ofpassStyleOf()
calls in this portion of the flamegraph.In addition, the subsequent
updatePurseBalance
requires Zoe to change thestate.currentBalance
of thelocalPooledPurse
virtual object, which requires serializing this many-valued Amount. In the benchmark run, this needs 35ms just to check that the value matches the declaredstateShape
for the property in question, and then another 10ms to perform the serialization (our userspace marshalling code is not nearly as fast as a native JSON implementation).All of this work is what consumes 137Mc in the zoe
mintAndEscrow()
delivery, and ends the first block. In the V8 benchmark run, we can see themintAndEscrow()
function takes 118ms, out of whichadd
takes 70ms andupdatePurseBalance
takes 45ms (leaving 3ms for other small stuff).In the mainnet slog record, we cannot directly tell which function is being called when, but from the syscalls we can deduce some dividing points and correlations: The total delivery time for the first
mintAndEscrow
(the last delivery of the first block) was 10.051s, which is broken down into three big parts:add()
updatePurseBalance()
(checking/serialize)It must do this all again for the B character, taking about the same amount of time, which ends the second block. At this point, both A and B characters are in the pooled purse.
The third block is ended by the
zoe.exitSeat()
method spending similar time withdrawing the A character from the pooled purse, so it can be transferred to a Payment for the user. This does asubtract
instead of anadd
, preceded by anisGTE()
check that is not necessary in thedeposit
case.The fourth block contains the remaining work, like depositing the royalty/platform fees, which are fast because IST/BLD/etc is entirely fungible. On chain, this took just under a second (
b13499085-r0
took 13.7Mc and 970ms).(The V8 benchmark flamegraph has an artifact, the long
deposit
section at the end, which is where the benchmark test is depositing the A character into the test user's purse. This is slow because the test only has a single user, so that Purse already contains 800 characters. On chain, most users have only one character each, so their purses are small, and when the wallet deposits the character in the user's private Purse, the time toadd
/subtract
/etc is insignificant).Directions To Pursue
This leads to a number of interesting questions, and ways we might improve the situation. First, the concerns:
localPooledPurse
are slow, and will get slower as more characters are mintedmint
(3 expensive things:add
+add
+subtract
),equip
/unequip
(2 expensive things:add
+subtract
), and buy/sell operations (probably oneadd
orsubtract
each)baseCharacters
are minted (so 800/380= 2.1x worse than the current 90s slowdown)expense = k * (numMinted + numForSale)
baggage
well), we do not have tests to demonstrate that it can be upgraded safely, so any contract-side fixes that might improve this behavior may be hard to deploy, first requiring engineering work to make the contract fully upgradeableThen, potential fixes we might pursue.
checkStatePropertyValue
assertion on vatstore read (in thestate
getter), under the theory that we will never write non-conformant state data into the vatstore on the setter side. This would save about 20% of themintAndEscrow()
time.passStyleOf
by giving it a realWeakMap
, instead of theVOAwareWeakMap
that is imposed on all userspace codecopyBag
MAX_DBKEY_LENGTH
in collectionManager.js, so the full (serialized) character record could be used as keys of the MapStore (#8864)Purse
for each character, and hold the equipped Items in that Purse, instead of in an open SeatinventorySeats
as remediation, perhaps one character at a time (as user actions touch them)equip()
protocol to return the same charactergive.CharacterKey = want.CharacterKey
inventorySeat
as remediationcheckBagEntries()
orcheckNoDuplicateKeys()
add()
does it once, maybeset()
then does it again on the same objectequip()
protocol cries out for "Use Objects" or "Owned Facets" like what @erights is working on, where the right to sell a Character is semi-independent of the right to talk to it (i.e. tell it to equip an item)equip()
would be a method on the Character object (or it's use-object facet, or something), rather than its own contract-level invitationequip()
would transfer ownership of the Item from the user to the character itself, in an internal PurseAnd ways in which this should inform other efforts:
can we add a benchmark test to the contract-readiness checklist to discover problem like this before deployment, so governance voters can make an informed choice?
the KREAd code is doing something unusual (holding Seats open forever) but not malicious, how can we build an execution-fee model that correctly charges for this?
vatPowers.getComputronsUsed()
would be appropriatecc @Chris-Hibbert @erights @dckc