Open alnoki opened 2 weeks ago
⏱️ 46s total CI duration on this PR
Job | Cumulative Duration | Recent Runs |
---|---|---|
permission-check | 12s | 🟥 🟥 |
permission-check | 12s | 🟥 🟥 |
permission-check | 12s | 🟥 🟥 |
permission-check | 10s | 🟥 🟥 |
Thanks for working on this, will check in details what's applicable here.
To note, perf testing was done on top of https://github.com/aptos-labs/aptos-core/pull/13194, which optimizes a few (but not all) things covered here as well.
We have not-yet-landed improvements to FA here.
Our metodology for testing throughput is a testsuite/single_node_performance.py benchmark, which with labels can be included on any PR (though for FA, also we need to enable APT FA by default) We made a set of benchmarks here - APT FA, with concurrent balance being default, and with it being opt-in.
We generally don’t focus on reducing gas per-se, but increasing throughput. From our experiments, biggest difference is how many resources are being read/accessed, and main regression here comes from needing to access ConcurrentFungibleBalance. Will benchmark your changes on top of the above, and see if we can get more out of it.
Thanks for looking into this!
biggest difference is how many resources are being read/accessed
@igor-aptos If that's the case and if exists
calls are cheap, then I'd suggest re-architecting structs to look like this:
#[resource_group_member(group = aptos_framework::object::ObjectGroup)]
/// The store object that holds fungible assets of a specific type associated with an account.
struct FungibleStore has key {
/// The address of the base metadata object.
metadata: Object<Metadata>,
/// The balance of the fungible metadata.
balance: u64,
/// If true, owner transfer is disabled that only `TransferRef` can move in/out from this store.
frozen: bool,
}
#[resource_group_member(group = aptos_framework::object::ObjectGroup)]
/// The store object that holds fungible assets of a specific type associated with an account.
struct ConcurrentFungibleStore has key {
/// The address of the base metadata object.
metadata: Object<Metadata>,
/// The balance of the fungible metadata.
balance: Aggregator<u64>,
/// If true, owner transfer is disabled that only `TransferRef` can move in/out from this store.
frozen: bool,
}
Then you can use algorithm flow like this:
/// Deposit `amount` of the fungible asset to `store`.
public fun deposit<T: key>(store: Object<T>, fa: FungibleAsset)
acquires FungibleStore, DispatchFunctionStore, ConcurrentFungibleStore {
// Declare local variables.
let store_addr = object::object_address(&store);
if (exists<ConcurrentFungibleStore>(store_addr)) {
// Do all operations on concurrent store only.
} else {
if (features::migrate_to_concurrent_fungible_balance_enabled()) {
// Migrate to concurrent fungible balance: here you have to borrow both the serial
// store and the concurrent store, but only for one operation.
} else {
// Do all operations on serial store only.
}
}
}
General
In #11183 @igor-aptos raised the issue of performance regression, and @movekevin is recommending against inlining per https://github.com/aptos-labs/aptos-core/pull/11183#issuecomment-2093873910 despite the importance of inlining for performance in critical code paths as I've mentioned in https://github.com/aptos-labs/aptos-core/pull/11183#issuecomment-2094325450 and @georgemitenkov has mentioned in https://github.com/aptos-labs/aptos-core/pull/11183#discussion_r1520679574.
I've flagged the importance of compiler optimizations at https://github.com/aptos-foundation/AIPs/issues/412, and in their absence, optimizations must be performed manually. For critical code paths, like APT transfers which are perhaps the most common txn in the ecosystem, manual optimizations/inlining can substantially reduce bytecode inefficiencies generated by sub-optimal abstractions.
Hence this PR optimizes the core execution paths for
deposit
,withdraw
, andtransfer
functions, which I notice have substantial inefficiencies:Note that this is only a subset of functions that are used in production; should the optimizations prove meaningful during benchmarks then other functions would require optimizations too.
Assorted additional fixes
deposit
andwithdraw
functions both incorrectly emit the error codeESTORE_IS_FROZEN
if the store doesn't exist, but they should emitEFUNGIBLE_STORE_EXISTENCE
. Fix accordingly in the optimized refactors.ensure_store_upgraded_to_concurrent_internal
which was recently de-inlined, remove an erroneous trailing comma that threw a compiler warning.test_fungible_asset_upgrade
.change_feature_flags
tochange_feature_flags_for_testing
.create_aggregator_with_value
inupgrade_to_concurrent
, which was resulting in an aggregator overflow.get_migrate_to_concurrent_fungible_balance_feature
call for feature set, since its omission was preventing concurrent FA store upgrade.Regression
@aching @davidiw @igor-aptos how do these optimizations affect the current benchmark regressions?
(Note that I am not aware of the current benchmark methodology, but I am glad to extend this to other functions if the benchmarks rely on different call paths)