Closed kkonevets closed 2 years ago
It looks like you are introducing new accounts that hold state. This is opens up a big can of worms security-wise, because now you need to reason about a set of accounts, and be very very careful every time to check that the accounts are related in the right way. Also, it becomes a harder to get a consistent view of the state from the RPC — now GetMultipleAccounts
is the only safe option, but people will forget that and introduce torn reads that lead to subtle bugs.
If you read the Neodyme report, they confirm:
Every program on Solana has to store data somehow. There are multiple approaches to this, and solido chose one of the safest ones: each instance of the solido program has a single unique account that stores all data. While solido allows for multiple instances to exist simultaneously, it is written in a way that rules out many typical Solana vulnerabilities by design. By storing all state in a single account that is subsequently required in every operation, all attacks that work by confusing between multiple program-owned accounts are by definition impossible.
I don’t think these additional accounts are needed; the lists can go into the main Solido account, you can still have the memory layout match the serialization format and transmute the input.
Also, it looks like much of the code in state.rs
was copied verbatim from https://github.com/solana-labs/solana-program-library/blob/b5cec79ee4ae0dab014e7d5d9551b30f959245af/stake-pool/program/src/state.rs#L525. I am not a lawyer, but I am pretty sure that this is a violation of section 4a, 4b, and 4c of the Apache 2.0 license that covers the Solana Program Library. If you do want to use any third-party code at all, I recommend starting with a verbatim copy of the original source, in a separate file, don’t mix it with our existing code. Then add a very clear header that says who the original author and copyright holder is, and the license, and that you changed these files. And do the changes in a separate commit, so it is clear what the changes are versus the original.
I deleted the message. Found a solution
Re: original message, what I would do is give the Lido
struct a field validators: &'a [(Pubkey, Validator)]
. That will give Lido
itself a lifetime parameter though, which may be a very invasive change. Then in the deserializer that takes the raw_account_data: &[u8]
, deserialize the known-size part of the Lido
struct from the first n bytes, then transmute the remainder of the raw data into a &[(Pubkey, Validator)]
and complete the struct.
If that is too invasive, or if it’s not directly possible to transmute the raw bytes into (Pubkey, Validator)
, an alternative is to only put the size of the list in the Lido
struct, and then in every method that needs access to the validator list, pass in the &[u8]
separately, and deserialize the single validator at that offset. That approach also doesn’t need an additional lifetime parameter, but I think it’s a bit more error-prone. Also, some operations require walking the validator list, so if some deserialization is still required per validator that will be expensive; I think a transmute would be the best option.
@ruuda Now I place maintainers and validators next to Lido struct in same account. Please take a look ASAP
So about ptr::read_unaligned
, I created this test: https://gist.github.com/ruuda/80fd0000d5d1f151565d8c6ebb471e82, then compiled it with cargo build-sbf --dump
, which among others dumps the disassembly to a file. When we compare the two functions, one that uses the unaligned *const Foo
to &Foo
cast, which is undefined behavior, and one that uses ptr::unaligned_read
, you can see that they both generate the same BPF code. So there is no overhead to using unaligned_read
for BPF targets, at least not in a setup like this, where we immediately access the fields. It compiles down to a normal memory access. You can see the read happens here. Apparently, there are no alignment requirements on reads in BPF. (So the BPF JIT compiler would have to compile all of them to unaligned reads if it can’t prove anything about the alignment of r0
. But for the UB version, LLVM doesn’t have to use unaligned loads, when it compiles for a non-BPF target!)
Either that, or my read_foo_x_somewhat_safer(data: &[u8], off: usize) -> u32
still contains undefined behavior that the compiler exploited. In particular there is the possibility that it reads out of bounds of data
. This stuff is super subtle.
Sure bpf does not have alignment requirements, it's like x86_64 allows unaligned loads. That's why my code works. Also your asm snippet does not contain ALIGN
directive.
So there is no overhead to using unaligned_read for BPF targets, at least not in a setup like this, where we immediately access the fields
Immediately accessing a filed does the trick, agree. I have a main performance bottleneck - looping through validators and summing effective_stake_balance. Seems like it will work this way. But we should bench it.
In other scenarios read_unaligned
will make a copy. Docs say read_unaligned creates a bitwise copy of T, regardless of whether T is Copy
I will maybe check it out, but will postpone to the very end.
About alignment. That is not a problem for our usecase. x86_64 and arm64 all support unaligned loads. Which means a client with any modern CPU will be able to use our CLI without errors. I tested our CLI on M1(which is arm64) and it worked just fine. So whether you are on Mac, Windows or Linux you will be fine. One exception is arm32 which is used in Android devices. But I guess our clients are not going to run CLI on smartphones.
On Lido account as a PDA account.
Here is one pitfall. PDA accounts have limit on max data size and it is much less than 1Mb (can’t remember exact value). About a month ago I have experimented with validator_list account as a PDA account and found the limit. Also I read about the limit somewhere on the internet. The point is that validator and maintainer lists won’t fit the limit, I have tried, I am 100% sure.
So now we have 3 choices, don't know which is better:
Reuse the current Lido account. As we can't resize it we should withdraw all lamports from it, wait for epoch boundary for it to be destroyed and create it again with a new data size. So we should save the state somewhere and init the new account with it after epoch boundary. Sounds very complicated and risky.
Create a new regular Lido account and this will break all client integrations.
Create two new accounts: validator_list and maintainer_list accounts. This is what I have done initially on the pull request. This has security issues as you pointed out, but nevertheless Solana stake-pool uses this approach.
In any case client integrations will be broken because Lido state serialisation format will change. Lots of clients use Lido state account directly and parse it themselves. Also JS frontend library will change.
But the easiest and safest option is to create a new regular account. By the way, lesson learned - when creating a state account create it with maximum data size
PDA accounts have limit on max data size and it is much less than 1Mb
Hmm, that’s unfortunate :confused:
Create a new regular Lido account and this will break all client integrations.
We can still put all the state in a new (non-PDA) account and keep the old state at the old location. Whether it’s a PDA or not does not make such a big difference. A PDA is nice because then we can enforce there is only a single canonical instance, but for the current version the state is also a regular account. Of course old clients will not know where to find the new account, but even if we kept it at the same address they wouldn’t know how to parse the v2 data, so clients need to update either way. With a new account, at least the data layout will not change under their feet.*
For doing the migration, we can put it in the migrate instruction. If would take both the old and new address, and it must be signed by both the manager of the instance and the new address.
* I now realize, we shouldn’t allow the instance to be migrated more than once (we could trust the manager to not do that, but it would be better to make it impossible). The best way to achieve that, I think, is to clear out the original account after completing the migration. So yes, then it does break existing clients, but still, an update would do that any way.
By the way, lesson learned - when creating a state account create it with maximum data size
This is somewhat expensive though, about 7 SOL for 1 MB. But yeah, it sounds like it would be worth it. (And it’s not as bad now that the price crashed :grimacing:)
If we create a new Lido address then we should change all Lido program derived accounts like reserve account, stake authority and mint authority (which means we should set new mint authority for stSOL). If we save in the state an old Solido address and derive from that
Pubkey::create_program_address(
&[
&OLD_SOLIDO_ADDRESS,
MINT_AUTHORITY,
&[self.mint_authority_bump_seed],
],
program_id,
)
it breaks all security, I guess. I don't thinks we should go this path.
So we should recreate all PDA accounts. Do you think that will work? Also creating a new reserve account means we should transfer funds from the old. Sounds complicated. Maybe just create validator and maintainer list accounts. That will work for sure.
If we save in the state an old Solido address and derive from that it breaks all security, I guess
D'oh, yes, you are right. Hmm :frowning_face:
So we should recreate all PDA accounts. Do you think that will work?
Hmm, yes ... I think it would!
Also creating a new reserve account means we should transfer funds from the old. Sounds complicated.
Hey, now I have a new idea ... I think this might be quite nice actually, because we don’t need to care about backwards compatibility at all in the new version! Maybe this even simplifies things!
Migrate
. Migrate would transfer everything in the v1 reserve to the v2 reserve, and change the mint authority of the mint from the v1 authority to the v2 authority.Migrate
can only be called after there are no active validators any more.What do you think?
I think it is too complicated. We discussed it internally and decided to reset everything back to maintainer and validator list accounts. This is a simple safe change and it will work for sure. I have tried enough already.
@ruuda I returned back validator and maintainer list accounts. Please check only correctness and security, not code style.
The main goal of this PR is to increase maximum number of validators and maintainers in a pool and optimize overall performance.
AccountMap
was replaced by BigVec which was rewritten to addget_mut(index)
andremove(index)
methodsprocessor.rs
validator_list
andmaintainer_list
which hold the listsvalidator_index
andmaintainer_index
are passed to instructions to precisely get/remove a value from a list, with credit to @ruudaswap_remove
in O(1), with credit to @ruudaValidator
,Maintainer
andLido
structures gotlido_version
field, which is checkedmax_validators
andmax_maintainers
is 6700, previously 64