coral-xyz / anchor

⚓ Solana Sealevel Framework
https://anchor-lang.com
Apache License 2.0
3.61k stars 1.32k forks source link

Anchor Events Are Extremely Compute Intensive When Used Without Maximal Compiler Optimizations #863

Closed bonedaddy closed 2 years ago

bonedaddy commented 3 years ago

Overview

With the default compiler optimizations, logging the same information through emit! is approximately 2x more expensive in compute units, than it is to directly use msg!.

When using maximal compiler optimizations as follows, the cost increase is drastically reduced, but is still more expensive.

lto = "fat"
codegen-units = 1
[profile.release.build-override]
opt-level = 3
incremental = false
codegen-units = 1

While this can solve compute unit usage issues within your own programs, it still leaves programs at risk of reaching compute unit limits when making CPI to third-party programs that emit events, without these compiler optimizations.

As it stands this is more of an edge-case issue, however if the current event system is kept in place I think once more programs are deployed onto Solana, and there is more interconnectivity between programs, this will probably start to become a real issue.

Proposed Solution - Avoid Borsh Serialization And Base64 Encoding Completely

Personally speaking I think the best option is to remove serialization and encoding of events, and use proc macros to format the event into a JSON string which is what gets logged. This has the added bonus of being human readable on block explorers.

Alternative Solutions

Optimize The Process Of Emitting Events

Not exactly sure what this would look like, but there's probably something that can be done. At a quick glance, changing this function to preallocate the vector entirely would help, but im not sure if this is possible.

Add Compiler Optimizations To Default Workspace File Generated By Anchor

armaniferrante commented 3 years ago

Have you measured the difference in compute units between JSON vs base64(borsh)?

Henry-E commented 3 years ago

Mango said Anchor events reduced their compute units for logging from 30k units to 4k. What kind of compute unit usage are you seeing?

bonedaddy commented 3 years ago

Have you measured the difference in compute units between JSON vs base64(borsh)?

Ah perhaps I should clarify, regardless of the serialization method being used, any sort of serialization will probably end up more or less costing the same amount of compute units. I'm not super familiar with proc macros, but I think a proc macro which generates a formatted message that can be used would be cheaper.

Mango said Anchor events reduced their compute units for logging from 30k units to 4k. What kind of compute unit usage are you seeing?

From a quick look through the two different branches which use traditional msg! invocation, vs using anchor events, it seems like less information is being included in the anchor events, which would probably explain the reduced compute units.

Examples

Here's an extremely quick PoC i put together, as per the program logs anchor events consume more than twice the amount of compute units compared to just logging a JSON string with fromatting:

Streaming transaction logs mentioning Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS. Confirmed commitment
Transaction executed in slot 1:
  Signature: 3a8nZ82HxiCmw8i9Sh2rN73UXPXXUfZbisgd6jjMBNcJNSRGFuEqzRntHACRoUbxB4cxtCvMWYYB3NSHqoyTqdnZ
  Status: Ok
  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: pev3G58vvc4GAAAAZm9vYmFyewAAAAAAAAA=
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 2684 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success
Transaction executed in slot 2:
  Signature: h8qk7juRpRcYefw4B5fKsKtFrL9AfDEQNSoNAFwX4KtiEXh1rWQGBPkRBwJbnoUbPqJkmj6rW1eFq369jvj8igK
  Status: Ok
  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: {"message": "foobar", "number": 123}
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 1214 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success

Heres the same PoC but with more data being logged, and there is still approximately twice the compute units being used:

Streaming transaction logs mentioning Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS. Confirmed commitment
Transaction executed in slot 1:
  Signature: 5QNVvaZpkk1xXHazr2K4MJ8SS6ezvBmeArvaPY9DHpX4yRMXzz28K7986nLViY1DWxtgtzToBfaFg2i5hjFMtRAX
  Status: Ok
  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: pev3G58vvc4GAAAAZm9vYmFyewAAAAAAAAADAAAAYmF6AwAAAGJhcgEBAAAAYQEAAABi
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 4769 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success
Transaction executed in slot 2:
  Signature: 5uKfhp57JUzVzTPosSZwT6uaAZeuQpsgZfdSEhwbuRAVXayqiEQzF38Cra1vybh9z9MugNZ1SJXbWa6zJVNx9bXa
  Status: Ok
  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: {"message": "foobar", "number": 123, "such": "baz", "compute_units": "bar", "very_wow": "true", "a": "a", "b": "b"}
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 2309 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success

Applying the following compile optimizations drastically reduced the discrepancy, however this doesn't really solve the issue because it means assuming everyone will deploy their anchor programs with these optimizations added.

lto = "fat"
codegen-units = 1
[profile.release.build-override]
opt-level = 3
incremental = false
codegen-units = 1

gives the following compute units usage:

Streaming transaction logs mentioning Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS. Confirmed commitment
Transaction executed in slot 1:
  Signature: 3L68jUnKw7XZaggryA8UfH9nXBwexXsJtQgj2XCBuSGzG6crsuFxWtgQE6dWdhTedTdBx825sv3nKkvZ9akMAZ8e
  Status: Ok
  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: pev3G58vvc4GAAAAZm9vYmFyewAAAAAAAAADAAAAYmF6AwAAAGJhcgEBAAAAYQEAAABi
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 2136 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success
Transaction executed in slot 2:
  Signature: 5izscJ9wTFnup8gxYyVponLVpgRuGhvEGfwbbaSa3KDmBMz5ZJ1ojH3jaSN3C1gXKDttCtWwrGecmTU2ZW9EGpqW
  Status: Ok
  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: {"message": "foobar", "number": 123, "such": "baz", "compute_units": "bar", "very_wow": "true", "a": "a", "b": "b"}
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 1976 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success
armaniferrante commented 3 years ago

any sort of serialization will probably end up more or less costing the same amount of compute units.

Can you clarify the original issue in this context and the suggested fix? Programs don't need to use the emit! macro if they find it too costly, and if there's a way to make it more efficient, we should do it.

bonedaddy commented 3 years ago

Can you clarify the original issue in this context and the suggested fix?

Sure i'll update the original issue.

Programs don't need to use the emit! macro if they find it too costly, and if there's a way to make it more efficient, we should do it.

That's personally what we've done with our programs for SolFarm, but that only really puts a bandaid on the issue, and only truly works as a solution if your program never CPI's to other programs, or just never CPI's to other programs that use the emit! macro.

Henry-E commented 3 years ago

Based on your program logs you're roughly saying that emitting the serialized data is 2k units whereas msg! logging a json string (pre formatted) is only 1k units.

It definitely sounds like a good argument for just printing preformatted json strings (aka don't let the msg! function call do the json formatting as mango used to do). Json strings are more human readable too, though it doesn't give as much guarantee that the data will follow the IDL. Maybe a solution could be that anchor formats its event structs into a json string and emits that instead of a serialized struct.

Though honestly, 1k units vs. 2k units might seem large percentage wise but in absolute terms it's more insignificant. When you use msg! by itself how much lower can it go below 1k units and is this something anchor can even fix?

Also relevant questions. Are you using a lot of these calls? Would it be possible to consolidate into less calls?

On Mon, Oct 11, 2021, 12:14 AM bonedaddy @.***> wrote:

Can you clarify the original issue in this context and the suggested fix?

Sure i'll update the original issue.

Programs don't need to use the emit! macro if they find it too costly, and if there's a way to make it more efficient, we should do it.

That's personally what we've done with our programs for SolFarm, but that only really puts a bandaid on the issue, and only truly works as a solution if your program never CPI's to other programs, or just never CPI's to other programs that use the emit! macro.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/project-serum/anchor/issues/863#issuecomment-939569970, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAHMGFGDEI4DY7NKAWSYITUGIM6LANCNFSM5FWAGXLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

fanatid commented 3 years ago

@bonedaddy can you measure instructions count with this PR? https://github.com/project-serum/anchor/pull/870

bonedaddy commented 3 years ago

Based on your program logs you're roughly saying that emitting the serialized data is 2k units whereas msg! logging a json string (pre formatted) is only 1k units.

To clarify this is with compiler optimizations applied, which isn't always going to be the case. Without compiler optimizations its 4k not 2k

Though honestly, 1k units vs. 2k units might seem large percentage wise but in absolute terms it's more insignificant.

Right it's negligible in this context where the amount of data being emitted is tiny, and there's nothing else going on. But once you start interacting with other programs via cpi, emitting multiple events, and getting into complex DeFi protocols it's not negligible at all.

Up until recently the program's we've integrated with have emitted a small number of events that the compute units used by events hasn't been problematic. However when Saber migrated their farms to Quarry which emits a number of events, the compute units consumed by events started becoming problematic.

When harvesting rewards from Quarry, 3 events are emitted:

1:

0Q/jA6yPVSDIdBbhXCLvltw50ZnjDx2hzw74NVn49kmpYj2VZHQJoeJoYJqaKcvuxCi/2i4yywedcVNDWkM84Iuw+cEn9/RO9XIsCgAAAABekW0CBY6JdAnNHugdhAYIn3bc02X5Bd3Knr5atolzZA==

2:

0Q/jA6yPVSDIdBbhXCLvltw50ZnjDx2hzw74NVn49kmpYj2VZHQJoeJoYJqaKcvuxCi/2i4yywedcVNDWkM84Iuw+cEn9/ROrkIAAAAAAAAzLlBGju5fSXlMaaQr6xpNwj+ObGjudYJMFWBICs4aSg==

3

XQ9GqjCM1NvQRUnAelAgZHG7mfT0auJVqc42SliyLJ5mi/zqETpgnAkWWUpkXF5787vWAVtrlruEZGZ1Wdppm7aySu3iC1+JCrXY4qBFI9dveEERQ1c4kdU46xjxj9Vi+QXkb2Kx45T1ciwKAAAAAK5CAAAAAAAAquNjYQAAAAA=

These logs include roughly 2x the amount of characters as the log from my PoC. For arguments sake, lets say this implies a 2x increase in the amount of compute units.

Quarry doesn't include the compiler optimizations so that means 1 event consumes 8k compute units, for a total of 24k compute units consumed across all three events, for a total consumption of 12% of the available compute units. If Quarry were to be built with the compiler optimizations, it would be half the cost so 6% of the available compute units. If the events were logged as JSON, it would amount to around 3% of the compute units.

In the case of Saber's Quarry farms the token you get from harvesting is an IOU which needs to be redeemed by another program, which also emits an event of roughly equal size. Using the hypothetical estimations so far, the act of harvesting + redeeming rewards without optimizations consumes a total of 32k compute units , or 16% of the available compute units.

Hypothetical estimations aside, the amount of events emitted from the harvest and redeem steps was enough that we had to adjust our vaults to do this in two separate transactions. Even if Quarry was deployed with compiler optimizations, we would be right on the edge of compute unit consumption, and would likely still have to do this in two steps. Events emitted as JSON would allow this to happen in one transaction.

Solana DeFi is just getting started, and with Anchor becoming increasingly popular, its reasonable to assume that events are going to be used more and more. If DeFi on Solana is going to behave as it does in other ecosystems, this means many different protocols interacting with each other. So while on a small scale (ie a program being used by a webui, or in this PoC) i agree the cost savings dont matter that much. Once you start considering the broader picture, a Solana ecosystem with hundreds of dapps interacting with each other, it does matter.

At present I believe compute units are instruction wide, the easiest work around is to simply include multiple instructions when you create a transaction. However consiering there is a change coming to Solana that will apply compute units budget transaction wide, this is only a short term solution.

Also relevant questions. Are you using a lot of these calls? Would it be possible to consolidate into less calls?

Apologies if I wasn't clear, we don't actually use events with SolFarm. The issue comes from needing to interact with other programs that use events.

bonedaddy commented 3 years ago

@bonedaddy can you measure instructions count with this PR? #870

Yup, I'm actually about to sign off for the night so I'll do this in the morning.

Henry-E commented 3 years ago

These logs include roughly 2x the amount of characters as the log from my PoC. For arguments sake, lets say this implies a 2x increase in the amount of compute units.

Does compute cost actually increase linearly with the amount of text being printed? This isn't something I've heard of before.

Solana DeFi is just getting started, and with Anchor becoming increasingly popular, its reasonable to assume that events are going to be used more and more.

No ones arguing against you on these points, just trying to accurately gauge what the problem actually looks like in order to have a strong starting point for figuring out potential solutions. Events and good logging are super important for the whole ecosystem!

bonedaddy commented 3 years ago

Does compute cost actually increase linearly with the amount of text being printed? This isn't something I've heard of before.

I'm not sure to honest, just used that as a guesstimate. However even if it doesn't scale linearly and its only a 25% increase, a 15% increase, or just a 10% increase, the compute units consumed from events becomes problematic the more complex your program gets. Even without complex programs, emitting one too many events is enough to cause problems (ie: the harvest + redemption example i mentioned previously).

No ones arguing against you on these points, just trying to accurately gauge what the problem actually looks like in order to have a strong starting point for figuring out potential solutions. Events and good logging are super important for the whole ecosystem!

Events are very important indeed, but the compute units currently consumed by events will not scale to more complex protocols that involve interacting with many other protocols. It's entirely possible to work around this now, but once transaction wide compute limits are implemented, it will be a blocker.

bonedaddy commented 3 years ago

@fanatid i got the following error:

    'Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]',
    'Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 198140 of 200000 compute units',
    'Program failed to complete: invalid utf-8 sequence of 1 bytes from index 0: [165, 235, 247, 27, 159, 47, 189, 206, 112, 101, 118, 51, 71, 53, 56, 118, 118, 99, 52, 71, 65, 65, 65, 65, 90, 109, 57, 118, 89, 109, 70, 121, 101, 119, 65, 65, 65, 65, 65, 65, 65, 65, 65, 68, 65, 65, 65, 65, 89, 109, 70, 54, 65, 119, 65, 65, 65, 71, 74, 104, 99, 103, 69, 66, 65, 65, 65, 65, 89, 81, 69, 65, 65, 65, 66, 105]',
    'Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS failed: Program failed to complete'
fanatid commented 3 years ago

@bonedaddy not sure why version with Vec<u8> does not work :thinking: Reverted, can you try one more time?

bonedaddy commented 3 years ago

Unfortunately that made compute units consumption worse :sob:

Compiling with optimizations:

  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: pev3G58vvc4GAAAAZm9vYmFyewAAAAAAAAADAAAAYmF6AwAAAGJhcgEBAAAAYQEAAABi
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 7952 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success
Transaction executed in slot 2:
  Signature: 63CfsfVJqWqnQCcuNK1DnnYDH3exLWYxja3zqy9VFrCTUr9upft8PLod6jmMnBq1PEj3amAsPcSh46dt9UxdQWXo
  Status: Ok
  Log Messages:
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS invoke [1]
    Program log: {"message": "foobar", "number": 123, "such": "baz", "compute_units": "bar", "very_wow": "true", "a": "a", "b": "b"}
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS consumed 1976 of 200000 compute units
    Program Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS success
fanatid commented 3 years ago

Interesting. Thanks that checked!

armaniferrante commented 3 years ago

A new syscall API was added recently, which hopefully addresses some of these concerns. See https://github.com/solana-labs/solana/pull/19764.

bonedaddy commented 3 years ago

Interesting, that looks very promising.

armaniferrante commented 2 years ago

What mango is doing to make this more efficient for memory usage: https://github.com/blockworks-foundation/mango-v3/pull/134/files

paul-schaaf commented 2 years ago

@bonedaddy would you mind benchmarking again with the master branch? master now uses the sol_log_data syscall

paul-schaaf commented 2 years ago

made some benchmarks for anchor 0.24.2:

repo: https://github.com/paul-schaaf/emit-bench (master has the custom profile, without-profile branch does not) results: https://gist.github.com/paul-schaaf/4742a7c474c38807918b2f7e6466b796

summary:

however this doesn't really solve the issue because it means assuming everyone will deploy their anchor programs with these optimizations added.

we can add the custom profile to our cli templates so that every anchor project going forward uses it.