alnoki commented 2 weeks ago

General

In #11183 @igor-aptos raised the issue of performance regression, and @movekevin is recommending against inlining per https://github.com/aptos-labs/aptos-core/pull/11183#issuecomment-2093873910 despite the importance of inlining for performance in critical code paths as I've mentioned in https://github.com/aptos-labs/aptos-core/pull/11183#issuecomment-2094325450 and @georgemitenkov has mentioned in https://github.com/aptos-labs/aptos-core/pull/11183#discussion_r1520679574.

I've flagged the importance of compiler optimizations at https://github.com/aptos-foundation/AIPs/issues/412, and in their absence, optimizations must be performed manually. For critical code paths, like APT transfers which are perhaps the most common txn in the ecosystem, manual optimizations/inlining can substantially reduce bytecode inefficiencies generated by sub-optimal abstractions.

Hence this PR optimizes the core execution paths for deposit, withdraw, and transfer functions, which I notice have substantial inefficiencies:

Redundant borrows inside inner functions.
Redundant error codes in inner functions.
Variable re-declaration.
Assorted double negations.
Redundant logical branches off the same condition.

Note that this is only a subset of functions that are used in production; should the optimizations prove meaningful during benchmarks then other functions would require optimizations too.

Assorted additional fixes

Currently, the deposit and withdraw functions both incorrectly emit the error code ESTORE_IS_FROZEN if the store doesn't exist, but they should emit EFUNGIBLE_STORE_EXISTENCE. Fix accordingly in the optimized refactors.
For ensure_store_upgraded_to_concurrent_internal which was recently de-inlined, remove an erroneous trailing comma that threw a compiler warning.
Fix broken test test_fungible_asset_upgrade.
1. Convert change_feature_flags to change_feature_flags_for_testing.
2. Fix broken call to create_aggregator_with_value in upgrade_to_concurrent, which was resulting in an aggregator overflow.
3. Include get_migrate_to_concurrent_fungible_balance_feature call for feature set, since its omission was preventing concurrent FA store upgrade.

Regression

@aching @davidiw @igor-aptos how do these optimizations affect the current benchmark regressions?

(Note that I am not aware of the current benchmark methodology, but I am glad to extend this to other functions if the benchmarks rely on different call paths)

trunk-io[bot] commented 2 weeks ago

⏱️ 46s total CI duration on this PR

Job	Cumulative Duration	Recent Runs
permission-check	12s	🟥 🟥
permission-check	12s	🟥 🟥
permission-check	12s	🟥 🟥
permission-check	10s	🟥 🟥

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

igor-aptos commented 2 weeks ago

Thanks for working on this, will check in details what's applicable here.

To note, perf testing was done on top of https://github.com/aptos-labs/aptos-core/pull/13194, which optimizes a few (but not all) things covered here as well.

igor-aptos commented 2 weeks ago

We have not-yet-landed improvements to FA here.

Our metodology for testing throughput is a testsuite/single_node_performance.py benchmark, which with labels can be included on any PR (though for FA, also we need to enable APT FA by default) We made a set of benchmarks here - APT FA, with concurrent balance being default, and with it being opt-in.

We generally don’t focus on reducing gas per-se, but increasing throughput. From our experiments, biggest difference is how many resources are being read/accessed, and main regression here comes from needing to access ConcurrentFungibleBalance. Will benchmark your changes on top of the above, and see if we can get more out of it.

Thanks for looking into this!

alnoki commented 2 weeks ago

biggest difference is how many resources are being read/accessed

@igor-aptos If that's the case and if exists calls are cheap, then I'd suggest re-architecting structs to look like this:

    #[resource_group_member(group = aptos_framework::object::ObjectGroup)]
    /// The store object that holds fungible assets of a specific type associated with an account.
    struct FungibleStore has key {
        /// The address of the base metadata object.
        metadata: Object<Metadata>,
        /// The balance of the fungible metadata.
        balance: u64,
        /// If true, owner transfer is disabled that only `TransferRef` can move in/out from this store.
        frozen: bool,
    }

    #[resource_group_member(group = aptos_framework::object::ObjectGroup)]
    /// The store object that holds fungible assets of a specific type associated with an account.
    struct ConcurrentFungibleStore has key {
        /// The address of the base metadata object.
        metadata: Object<Metadata>,
        /// The balance of the fungible metadata.
        balance: Aggregator<u64>,
        /// If true, owner transfer is disabled that only `TransferRef` can move in/out from this store.
        frozen: bool,
    }

Then you can use algorithm flow like this:

    /// Deposit `amount` of the fungible asset to `store`.
    public fun deposit<T: key>(store: Object<T>, fa: FungibleAsset)
    acquires FungibleStore, DispatchFunctionStore, ConcurrentFungibleStore {

        // Declare local variables.
        let store_addr = object::object_address(&store);

        if (exists<ConcurrentFungibleStore>(store_addr)) {
            // Do all operations on concurrent store only.
        } else {
            if (features::migrate_to_concurrent_fungible_balance_enabled()) {
                // Migrate to concurrent fungible balance: here you have to borrow both the serial
                // store and the concurrent store, but only for one operation.
            } else {
                // Do all operations on serial store only.
            }
        }
    }

aptos-labs / aptos-core

Eliminate call path inefficiencies for concurrent balance FA APT transfers, make assorted fixes #13203

General

Assorted additional fixes

Regression