coral-xyz / anchor

⚓ Solana Sealevel Framework
https://anchor-lang.com
Apache License 2.0
3.69k stars 1.35k forks source link

Anchor is heap hungry #2231

Open Arrowana opened 2 years ago

Arrowana commented 2 years ago

Problem

With transaction v2 and address lookup tables, it is now possible to do a lot more in a single program call and to do a lot more overall. In fact, Jupiter is already capable of consuming up to 64 accounts (current account lock max) for routed swaps through multiple pools.

A new issue that arise is that the default heap is 32 * 1032 and the default bump allocator (does not deallocate). As a result we start to see cases where we run out of heap.

https://github.com/solana-labs/solana/blob/d9b0fc0e3eec67dfe4a97d9298b15969b2804fab/sdk/program/src/entrypoint.rs#L101-L132

> Program logged: "Error: memory allocation failed, out of memory"
> Program consumed: 471325 of 786985 compute units> Program returned error: "BPF program panicked"
> Program returned error: "Program failed to complete"

Example trying to find out where the heap goes

Using the default BumpAllocator and logging of the pointer position I produced this log for a single CPI

one CPI to mercurial pool (using the to_account_infos and to_account_metas pattern) as does the CpiContext

8 accounts AccountMeta => 8 34 = 408 AccountInfo => 12 ? = ?

The log of the diff of the pointer position for each step of allocation

[2022-10-17T23:30:06.725476370Z DEBUG solana_runtime::message_processor::stable_log] Program log: after to_account_metas: 370
[2022-10-17T23:30:06.726310747Z DEBUG solana_runtime::message_processor::stable_log] Program log: after to_account_infos: 1686

so 2056 bytes of heap gone for that simple step.

I can produce a minimal repo to repro if people are interested but using the link from entrypoint.rs is pretty much what i did

in comparison a vanilla token transfer call: https://github.com/solana-labs/solana-program-library/blob/d02d49e2671c6af06633d21ecdb8970423880c75/token-swap/program/src/processor.rs#L187 allocates only AccountMetas in a ve for the Instruction accounts field.

Possible solutions

Side effect

bonedaddy commented 2 years ago

You can use the compute budget program to request more heap memory, I think you can request around double the heap memory

--

I think the biggest issue comes from way vectors are used. If you look at the implementations of to_account_infos, etc.. they all simply initialize the vector with an empty capacity using Vec::new(). If you are going to push 12 accounts onto this vector, your will end up reallocating memory 5 times in total.

You can drastically reduce memory allocations using Vec::with_capacity(est_num_accounts) instead, and because you wouldn't reallocate memory you are not going to waste memory as the vector reallocates.

Arrowana commented 2 years ago

I am aware of both of those solutions but i don't think they are great in general:

Henry-E commented 2 years ago

One solution that I'm investigating for the new set of constraints is switching account_info to being accessed as a zero copy struct instead of copying it to the heap. That should solve quite a bit of that overhead, right?

Henry-E commented 2 years ago

Ok, so to_account_metas and to_account_infos don't contain remaining_accounts from what i can see. But the reason they have the system where they extend a vec with vec![] is because anchor supports nested derive(Accounts) struct. So you don't know that the account type might actually be another derive(Accounts) struct with more than one account on them.

If it really is the case that the reassigning of the vecs is the major issue here, then it's possible we could add a len attribute or something similar to the account types. This would allow for allocating the length of the vector correctly up front? (although maybe optional accounts messes with this 🤷 )

luke-truitt commented 1 year ago

was there a clean solution to this @Arrowana ?

Arrowana commented 1 year ago

was there a clean solution to this @Arrowana ?

No full solution yet, there is some major savings possible trying to carefully create any vec with Vec::with_capacity

Also while this gobbles quite some heap, the other area for improvement is the usage of BorshSerialize::try_to_vec

https://github.com/near/borsh-rs/blob/dc1b0aa459a574b314f7bfab752db6d27319bb3f/borsh/src/ser/mod.rs#L58-L62

it is used in data(&self) -> Vec<u8> for ix data to cpi, event, new event_cpi...

It calls Vec::with_capacity(1024), so instantly no matter the size of what is serialized, <=1024 - 8 is allocated for nothing.

I believe a rework is needed to potentially leverage the existing proc macro of InitSpace, so that we can have auto max sizing to allow each implementation to become something like

pub trait InstructionData: Discriminator + AnchorSerialize + Space {
    fn data(&self) -> Vec<u8> {
        let mut d = Vec::with_capacity(8 + Self::INIT_SPACE);
        d.extend(Self::discriminator());
        self.serialize(&mut d).expect("Should always serialize"));
        d
    }
}

#[automatically_derived]
impl anchor_lang::Space for #name {
    const INIT_SPACE: usize = 1 + #max;
}

Or with auto sizing optional

Aursen commented 1 year ago

I hadn't thought of that, but it could be a good solution. Also, implementing the Iterator trait instead of using vectors could be a good approach

acheroncrypto commented 1 year ago

I believe a rework is needed to potentially leverage the existing proc macro of InitSpace, so that we can have auto max sizing to allow each implementation to become something like

pub trait InstructionData: Discriminator + AnchorSerialize + Space {
    fn data(&self) -> Vec<u8> {
        let mut d = Vec::with_capacity(8 + Self::INIT_SPACE);
        d.extend(Self::discriminator());
        self.serialize(&mut d).expect("Should always serialize"));
        d
    }
}

#[automatically_derived]
impl anchor_lang::Space for #name {
    const INIT_SPACE: usize = 1 + #max;
}

Or with auto sizing optional

I'm on board with decreasing the usage of BorshSerialize::try_to_vec on program side but from what I can see, InstructionData is only being used from client side and it shouldn't cause any extra heap allocations on program side.

Arrowana commented 1 year ago

I believe a rework is needed to potentially leverage the existing proc macro of InitSpace, so that we can have auto max sizing to allow each implementation to become something like

pub trait InstructionData: Discriminator + AnchorSerialize + Space {
    fn data(&self) -> Vec<u8> {
        let mut d = Vec::with_capacity(8 + Self::INIT_SPACE);
        d.extend(Self::discriminator());
        self.serialize(&mut d).expect("Should always serialize"));
        d
    }
}

#[automatically_derived]
impl anchor_lang::Space for #name {
    const INIT_SPACE: usize = 1 + #max;
}

Or with auto sizing optional

I'm on board with decreasing the usage of BorshSerialize::try_to_vec on program side but from what I can see, InstructionData is only being used from client side and it shouldn't cause any extra heap allocations on program side.

No, it does. When using the cpi ixs, or when emitting events it is also used. So this is happening program side. I wasn't clear enough but all serialization steps are affected, excepted directly into an account data

https://github.com/coral-xyz/anchor/blob/10eb6989124ec7cccd693a131d494c5b2b7bfefc/lang/syn/src/codegen/program/cpi.rs#L35-L47

https://github.com/coral-xyz/anchor/blob/10eb6989124ec7cccd693a131d494c5b2b7bfefc/lang/attribute/event/src/lib.rs#L36-L42

acheroncrypto commented 1 year ago

I believe a rework is needed to potentially leverage the existing proc macro of InitSpace, so that we can have auto max sizing to allow each implementation to become something like

pub trait InstructionData: Discriminator + AnchorSerialize + Space {
    fn data(&self) -> Vec<u8> {
        let mut d = Vec::with_capacity(8 + Self::INIT_SPACE);
        d.extend(Self::discriminator());
        self.serialize(&mut d).expect("Should always serialize"));
        d
    }
}

#[automatically_derived]
impl anchor_lang::Space for #name {
    const INIT_SPACE: usize = 1 + #max;
}

Or with auto sizing optional

I'm on board with decreasing the usage of BorshSerialize::try_to_vec on program side but from what I can see, InstructionData is only being used from client side and it shouldn't cause any extra heap allocations on program side.

No, it does. When using the cpi ixs, or when emitting events it is also used. So this is happening program side. I wasn't clear enough but all serialization steps are affected, excepted directly into an account data

I think I wasn't clear enough either. I'm on board with optimizing the usage of BorshSerialize::try_to_vec on program side(which is being used in CPI, events and return data) but the example code you've given is about InstructionData which only seems to be used from client side. Could you share the usage of InstructionData from program side if I'm missing something here?

Arrowana commented 1 year ago

I believe a rework is needed to potentially leverage the existing proc macro of InitSpace, so that we can have auto max sizing to allow each implementation to become something like

pub trait InstructionData: Discriminator + AnchorSerialize + Space {
    fn data(&self) -> Vec<u8> {
        let mut d = Vec::with_capacity(8 + Self::INIT_SPACE);
        d.extend(Self::discriminator());
        self.serialize(&mut d).expect("Should always serialize"));
        d
    }
}

#[automatically_derived]
impl anchor_lang::Space for #name {
    const INIT_SPACE: usize = 1 + #max;
}

Or with auto sizing optional

I'm on board with decreasing the usage of BorshSerialize::try_to_vec on program side but from what I can see, InstructionData is only being used from client side and it shouldn't cause any extra heap allocations on program side.

No, it does. When using the cpi ixs, or when emitting events it is also used. So this is happening program side. I wasn't clear enough but all serialization steps are affected, excepted directly into an account data

I think I wasn't clear enough either. I'm on board with optimizing the usage of BorshSerialize::try_to_vec on program side(which is being used in CPI, events and return data) but the example code you've given is about InstructionData which only seems to be used from client side. Could you share the usage of InstructionData from program side if I'm missing something here?

oups, correct it isn't. So not InstructionData but the other ones.