Integrate BigVec data structure

kkonevets commented 2 years ago

The main goal of this PR is to increase maximum number of validators and maintainers in a pool and optimize overall performance.

AccountMap was replaced by BigVec which was rewritten to add get_mut(index) and remove(index) methods
optimized some places in processor.rs
added two new accounts validator_list and maintainer_list which hold the lists
validator_index and maintainer_index are passed to instructions to precisely get/remove a value from a list, with credit to @ruuda
deletion from a list is done with swap_remove in O(1), with credit to @ruuda
Validator, Maintainer and Lido structures got lido_version field, which is checked
Account type is written on serialization and checked on deserialization
current max_validators and max_maintainers is 6700, previously 64

ruuda commented 2 years ago

It looks like you are introducing new accounts that hold state. This is opens up a big can of worms security-wise, because now you need to reason about a set of accounts, and be very very careful every time to check that the accounts are related in the right way. Also, it becomes a harder to get a consistent view of the state from the RPC — now GetMultipleAccounts is the only safe option, but people will forget that and introduce torn reads that lead to subtle bugs.

If you read the Neodyme report, they confirm:

Every program on Solana has to store data somehow. There are multiple approaches to this, and solido chose one of the safest ones: each instance of the solido program has a single unique account that stores all data. While solido allows for multiple instances to exist simultaneously, it is written in a way that rules out many typical Solana vulnerabilities by design. By storing all state in a single account that is subsequently required in every operation, all attacks that work by confusing between multiple program-owned accounts are by definition impossible.

I don’t think these additional accounts are needed; the lists can go into the main Solido account, you can still have the memory layout match the serialization format and transmute the input.

Also, it looks like much of the code in state.rs was copied verbatim from https://github.com/solana-labs/solana-program-library/blob/b5cec79ee4ae0dab014e7d5d9551b30f959245af/stake-pool/program/src/state.rs#L525. I am not a lawyer, but I am pretty sure that this is a violation of section 4a, 4b, and 4c of the Apache 2.0 license that covers the Solana Program Library. If you do want to use any third-party code at all, I recommend starting with a verbatim copy of the original source, in a separate file, don’t mix it with our existing code. Then add a very clear header that says who the original author and copyright holder is, and the license, and that you changed these files. And do the changes in a separate commit, so it is clear what the changes are versus the original.

kkonevets commented 2 years ago

I deleted the message. Found a solution

ruuda commented 2 years ago

Re: original message, what I would do is give the Lido struct a field validators: &'a [(Pubkey, Validator)]. That will give Lido itself a lifetime parameter though, which may be a very invasive change. Then in the deserializer that takes the raw_account_data: &[u8], deserialize the known-size part of the Lido struct from the first n bytes, then transmute the remainder of the raw data into a &[(Pubkey, Validator)] and complete the struct.

If that is too invasive, or if it’s not directly possible to transmute the raw bytes into (Pubkey, Validator), an alternative is to only put the size of the list in the Lido struct, and then in every method that needs access to the validator list, pass in the &[u8] separately, and deserialize the single validator at that offset. That approach also doesn’t need an additional lifetime parameter, but I think it’s a bit more error-prone. Also, some operations require walking the validator list, so if some deserialization is still required per validator that will be expensive; I think a transmute would be the best option.

kkonevets commented 2 years ago

@ruuda Now I place maintainers and validators next to Lido struct in same account. Please take a look ASAP

ruuda commented 2 years ago

So about ptr::read_unaligned, I created this test: https://gist.github.com/ruuda/80fd0000d5d1f151565d8c6ebb471e82, then compiled it with cargo build-sbf --dump, which among others dumps the disassembly to a file. When we compare the two functions, one that uses the unaligned *const Foo to &Foo cast, which is undefined behavior, and one that uses ptr::unaligned_read, you can see that they both generate the same BPF code. So there is no overhead to using unaligned_read for BPF targets, at least not in a setup like this, where we immediately access the fields. It compiles down to a normal memory access. You can see the read happens here. Apparently, there are no alignment requirements on reads in BPF. (So the BPF JIT compiler would have to compile all of them to unaligned reads if it can’t prove anything about the alignment of r0. But for the UB version, LLVM doesn’t have to use unaligned loads, when it compiles for a non-BPF target!)

Either that, or my read_foo_x_somewhat_safer(data: &[u8], off: usize) -> u32 still contains undefined behavior that the compiler exploited. In particular there is the possibility that it reads out of bounds of data. This stuff is super subtle.

kkonevets commented 2 years ago

Sure bpf does not have alignment requirements, it's like x86_64 allows unaligned loads. That's why my code works. Also your asm snippet does not contain ALIGN directive.

So there is no overhead to using unaligned_read for BPF targets, at least not in a setup like this, where we immediately access the fields

Immediately accessing a filed does the trick, agree. I have a main performance bottleneck - looping through validators and summing effective_stake_balance. Seems like it will work this way. But we should bench it.

In other scenarios read_unaligned will make a copy. Docs say read_unaligned creates a bitwise copy of T, regardless of whether T is Copy I will maybe check it out, but will postpone to the very end.

kkonevets commented 2 years ago

About alignment. That is not a problem for our usecase. x86_64 and arm64 all support unaligned loads. Which means a client with any modern CPU will be able to use our CLI without errors. I tested our CLI on M1(which is arm64) and it worked just fine. So whether you are on Mac, Windows or Linux you will be fine. One exception is arm32 which is used in Android devices. But I guess our clients are not going to run CLI on smartphones.

kkonevets commented 2 years ago

On Lido account as a PDA account.

Here is one pitfall. PDA accounts have limit on max data size and it is much less than 1Mb (can’t remember exact value). About a month ago I have experimented with validator_list account as a PDA account and found the limit. Also I read about the limit somewhere on the internet. The point is that validator and maintainer lists won’t fit the limit, I have tried, I am 100% sure.

So now we have 3 choices, don't know which is better:

Reuse the current Lido account. As we can't resize it we should withdraw all lamports from it, wait for epoch boundary for it to be destroyed and create it again with a new data size. So we should save the state somewhere and init the new account with it after epoch boundary. Sounds very complicated and risky.
Create a new regular Lido account and this will break all client integrations.
Create two new accounts: validator_list and maintainer_list accounts. This is what I have done initially on the pull request. This has security issues as you pointed out, but nevertheless Solana stake-pool uses this approach.

In any case client integrations will be broken because Lido state serialisation format will change. Lots of clients use Lido state account directly and parse it themselves. Also JS frontend library will change.

But the easiest and safest option is to create a new regular account. By the way, lesson learned - when creating a state account create it with maximum data size

ruuda commented 2 years ago

PDA accounts have limit on max data size and it is much less than 1Mb

Hmm, that’s unfortunate :confused:

Create a new regular Lido account and this will break all client integrations.

We can still put all the state in a new (non-PDA) account and keep the old state at the old location. Whether it’s a PDA or not does not make such a big difference. A PDA is nice because then we can enforce there is only a single canonical instance, but for the current version the state is also a regular account. Of course old clients will not know where to find the new account, but even if we kept it at the same address they wouldn’t know how to parse the v2 data, so clients need to update either way. ~~With a new account, at least the data layout will not change under their feet.~~*

For doing the migration, we can put it in the migrate instruction. If would take both the old and new address, and it must be signed by both the manager of the instance and the new address.

* I now realize, we shouldn’t allow the instance to be migrated more than once (we could trust the manager to not do that, but it would be better to make it impossible). The best way to achieve that, I think, is to clear out the original account after completing the migration. So yes, then it does break existing clients, but still, an update would do that any way.

ruuda commented 2 years ago

By the way, lesson learned - when creating a state account create it with maximum data size

This is somewhat expensive though, about 7 SOL for 1 MB. But yeah, it sounds like it would be worth it. (And it’s not as bad now that the price crashed :grimacing:)

kkonevets commented 2 years ago

If we create a new Lido address then we should change all Lido program derived accounts like reserve account, stake authority and mint authority (which means we should set new mint authority for stSOL). If we save in the state an old Solido address and derive from that

Pubkey::create_program_address(
    &[
        &OLD_SOLIDO_ADDRESS,
        MINT_AUTHORITY,
        &[self.mint_authority_bump_seed],
    ],
    program_id,
)

it breaks all security, I guess. I don't thinks we should go this path.

So we should recreate all PDA accounts. Do you think that will work? Also creating a new reserve account means we should transfer funds from the old. Sounds complicated. Maybe just create validator and maintainer list accounts. That will work for sure.

ruuda commented 2 years ago

If we save in the state an old Solido address and derive from that it breaks all security, I guess

D'oh, yes, you are right. Hmm :frowning_face:

So we should recreate all PDA accounts. Do you think that will work?

Hmm, yes ... I think it would!

Also creating a new reserve account means we should transfer funds from the old. Sounds complicated.

Hey, now I have a new idea ... I think this might be quite nice actually, because we don’t need to care about backwards compatibility at all in the new version! Maybe this even simplifies things!

Create v2 without regard for backwards compatibility (though I would keep the version byte, just to avoid ambiguity).
Deploy the program to a new address, keep the v1 program at the old address.
The v2 program would have a new PDA for its reserve, and also for the mint authority.
Build a v1.4 on top of v1, with only one new additional instruction: Migrate. Migrate would transfer everything in the v1 reserve to the v2 reserve, and change the mint authority of the mint from the v1 authority to the v2 authority.
After the migration happens, v1 can no longer mint or burn stSOL, so deposts and withdraws on it would stop working. It should enforce then, that Migrate can only be called after there are no active validators any more.
All of the complexity of the migration then stays in this v1.4 program, which is basically a single-use program.
This makes the v2 program simpler, so easier to audit and reason about, and it also makes it a bit easier to review the migration in isolation.

What do you think?

kkonevets commented 2 years ago

I think it is too complicated. We discussed it internally and decided to reset everything back to maintainer and validator list accounts. This is a simple safe change and it will work for sure. I have tried enough already.

kkonevets commented 2 years ago

@ruuda I returned back validator and maintainer list accounts. Please check only correctness and security, not code style.

ChorusOne / solido

Integrate BigVec data structure #573