Closed marsella closed 1 year ago
Here are some initial thoughts from looking over the Participant
code.
It looks like every Participant
retains its own id
and other_participant_ids
, and takes a main_storage
as an argument to its various methods. Currently, the top-level Participant
retains main_storage
in its struct and passes it as needed. Here's my initial suggestion(s):
ParticipantInfo
(or some better name) struct which contains id
, other_participant_ids
, and main_storage
. The caller is responsible for creating this struct. We can then pass this wrapped in Arc<Mutex<>>
to the top-level Participant
, which can pass this on to all the sub-protocols. Now, each protocol has access to the main_storage
produced by the caller, as well as any local_storage
it needs.Storage
struct to a Storage
trait with a separate "public" and "secret" storage API. It is up to the caller to implement that correctly (is that too risky?), but then internally we can use this new Storage
API to store things properly.Here are some additional thoughts worth considering in the above design:
Arc<Mutex<>>
the entire ParticipantInfo
struct, or just main_storage
? Probably the latter is sufficient?Storage
trait? We could have a default implementation that the caller could use while still allowing the caller to implement their own if necessary.Storage
trait still needs to be designed, but I'd imagine it'd look similar to what we have now, except separating out calls to "public" and "secret" storage.I'm thinking it would be good to throw other_participant_ids
into Storage, where it can be tagged with a session id, so that Participants can handle different peer groups in different sessions concurrently.
I like the idea of a Storage trait. And I agree with providing a default implementation that just stores things in memory like we do now, especially so we have something to use in tests (would it make sense to provide an Encrypted memory version too?). But I like this approach for making it modular
One issue with moving the current Storage
API to a trait is it exposes a bunch of internals (such as StorableType
). I think the way around this is to have store
just take a key-value pair, e.g.:
fn store<K: Storable>(&mut self, key: K, data: &[u8]) -> Result<()>;
And likewise for retrieve
. This'll require a big refactoring of the code but should simplify the outward facing API greatly.
Here's what I've got currently for the Storage
trait. It seems to capture everything we need while hiding internals:
pub trait StorageT: Debug {
/// Store `data` at the entry provided by `key`.
fn store<K: Storable>(&mut self, key: K, data: &[u8]) -> Result<()>;
/// Retrieve data given `key`.
fn retrieve<K: Storable>(&self, key: K) -> Result<Vec<u8>>;
/// Transfer data at entry `key` to `storage`.
fn transfer<K: Storable, S: StorageT>(&self, key: K, storage: &mut S) -> Result<()>;
/// Delete data at entry `key`, returning the serialized result.
fn delete<K: Storable>(&mut self, key: K) -> Result<Vec<u8>>;
/// Returns `true` if all `keys` are used.
fn contains_batch<K: Storable>(&self, keys: &[K]) -> Result<bool>;
/// Retrieve deserialized data given `key`.
fn retrieve_deserialized<K: Storable, D: DeserializeOwned>(&self, key: K) -> Result<D> {
let data = self.retrieve(key)?;
deserialize!(&data)
}
}
The one thing it doesn't capture is public versus secret storage. I think that can be handled either with a bool
argument denoting where to store the data, or with separate store_secure
etc. functions. I'd vote for the former for API simplicity.
Here's the latest and greatest API. It contains a SecurityType
annotation to let the implementation know whether to store the entry in secure storage or not.
/// A data storage API.
///
/// This trait defines a key-value store for storing public and private data during
/// the execution of the protocol. Whether a value should be stored securely or not
/// is specified by the [`SecurityType`] argument. **It is up to the implementer to implement secure
/// storage!**
pub trait StorageT: Debug {
/// Store serialized `data` at the entry associated with `key`.
fn store<K: Storable>(&mut self, key: K, data: &[u8], security: SecurityType) -> Result<()>;
/// Retrieve data associated with `key`, returning the serialized result.
fn retrieve<K: Storable>(&self, key: K, security: SecurityType) -> Result<Vec<u8>>;
/// Transfer data associated `key` to `other`.
fn transfer<K: Storable, S: StorageT>(
&self,
key: K,
other: &mut S,
security: SecurityType,
) -> Result<()>;
/// Delete data associated with `key`, returning the serialized result.
fn delete<K: Storable>(&mut self, key: K, security: SecurityType) -> Result<Vec<u8>>;
}
Thanks for making progress on this! I think my overall reaction is that this is a reasonable abstraction based on the storage we currently have, but think there are some improvements we could make to the current storage model that would lead us to a nicer trait. I'm wondering if we might try to break this down into a few internal refactors first before abstracting the trait.
For example, I don't like the current method of storing plain bytes even though we know the type of everything stored, and I don't like that we save the StorableType
but don't actually tie that to the stored/retrieved type (which the caller has to pick/serde manually). I'd rather pass Rust types directly in the storage API. I'm not sure the best way to do this (maybe have the value type be enum StoredData
which is like the current StorableType
enum, but actually containing the data??)
Question about the abstraction: Do you think the key type will it always be the tuple of (stored type, session id, party id)
? I'm trying to think about whether it's plausible that we'd change that index type or use different ones for different scenarios internally, or whether it'd be reasonable to make StorageIndex
public (but the internals private) and use it directly. Is the fully generic key type going to trip us up at some point?
I like the security type flag. It's an improvement over what we currently have. I wonder if we could try to tie this to the stored data type, since we already know what the secure types are. This would just be an internal misuse-resistance thing to prevent us from calling the wrong one. For transferring, I wonder if that should take a security type or if we should document that the implementation should match the security type from the source storage in the destination?
I see in your branch that you've already done a bunch of work to pull out the trait, so maybe it's reasonable to pull it out but plan that it's not our "final version" -- we can follow up by identifying some good refactors in separate issues. Do you or @gatowololo have preferences on the ordering (refactor storage vs pull out trait)?
One other idea I just thought of is to have the library handle encryption itself; that is, there's a wrapper around the
StorageT
trait that's used internally and encrypts things that are markedPrivate
before passing them to theStorageT
implementation (and thenStorageT
would lose theSecurityType
tag). This would avoid the user needing to worry about implementing a secureStorageT
instantiation, which seems more misuse-resistant, if not some additional complexity on the library side.
The difficulty of this approach is that we now need to also handle key management for secure storage. I don't want to try to make that decision in a generic way (or even to make the decision of, like, how are we encrypting and what's your key size and so on).
For example, I don't like the current method of storing plain bytes even though we know the type of everything stored, and I don't like that we save the
StorableType
but don't actually tie that to the stored/retrieved type (which the caller has to pick/serde manually). I'd rather pass Rust types directly in the storage API. I'm not sure the best way to do this (maybe have the value type beenum StoredData
which is like the currentStorableType
enum, but actually containing the data??)
I really like this idea. To avoid exposing StoredData
to the user I think we can keep the StorageT
trait the same and have an internal store
operation that remaps the StoredData
values into key
and data
values to be consumed by the trait.
Question about the abstraction: Do you think the key type will it always be the tuple of
(stored type, session id, party id)
? I'm trying to think about whether it's plausible that we'd change that index type or use different ones for different scenarios internally, or whether it'd be reasonable to makeStorageIndex
public (but the internals private) and use it directly. Is the fully generic key type going to trip us up at some point?
Good question. I honestly don't know. A fully generic key type is certainly the most general, so if there is an inkling we'd need to store different key-value pairs I can see it being a benefit (that way the public API is stable). But it certainly is more general than what we need now.
I like the security type flag. It's an improvement over what we currently have. I wonder if we could try to tie this to the stored data type, since we already know what the secure types are. This would just be an internal misuse-resistance thing to prevent us from calling the wrong one. For transferring, I wonder if that should take a security type or if we should document that the implementation should match the security type from the source storage in the destination?
Yes, this is a great idea. As above, we can keep this machinery internal and still require the StorageT
type to separate out public and private stores.
I see in your branch that you've already done a bunch of work to pull out the trait, so maybe it's reasonable to pull it out but plan that it's not our "final version" -- we can follow up by identifying some good refactors in separate issues. Do you or @gatoWololo have preferences on the ordering (refactor storage vs pull out trait)?
I was mostly doing that because it helps me reason through the API when I make the changes explicit in the code. I'm open to ditching the branch if we'd like to take a different approach to the refactor. Alternatively, I think what I have now might be a good starting point (in that it pulls out the StorageT
trait [I'm also open to a better name than StorageT
...]), and it should be straightforward to make some of the above changes on top of that version.
I like the current direction you guys are proposing. Some thoughts (Part 1/N):
We can then pass this wrapped in Arc<Mutex<>> to the top-level Participant, which can pass this on to all the sub-protocols. Now, each protocol has access to the main_storage produced by the caller, as well as any local_storage it needs.
Is it necessary to Arc<Mutex<>> the entire ParticipantInfo struct, or just main_storage? Probably the latter is sufficient?
I like this approach. Specially having the Participant and Sub-Participants share the participant IDs via a reference. Ideally, A sub-participant would not get a reference to the whole main_storage, but only the subset it needs?
is it too risky to have the caller implement the Storage trait? We could have a default implementation that the caller could use while still allowing the caller to implement their own if necessary.
I think it depends how complicated the storage implementation ends up being for the caller. Either way, we will need a "reference" implementation for our own testing and prototyping.
I'm thinking it would be good to throw other_participant_ids into Storage, where it can be tagged with a session id, so that Participants can handle different peer groups in different sessions concurrently.
Right, I think one nice thing about the current implementation is the separation of code and state. We could even take it one step further: I have been thinking that we would instantiate a new Participant per session (passing in the necessary starting state). This would work well with our task-based Tokio async model our server will be using. We can spawn a new participant object to run on our Tokio tasks. So multiple concurrent sessions would not share any state with each other.
I don't like the current method of storing plain bytes even though we know the type of everything stored, and I don't like that we save the StorableType but don't actually tie that to the stored/retrieved type (which the caller has to pick/serde manually).
Agreed! For example, StorableType::BroadcastSet
appears to be associated with a message_votes: HashMap<BroadcastIndex, Vec<u8>>
but this is not obvious except at the usage sites. It would be nice for the Storage to be aware of this.
I'd rather pass Rust types directly in the storage API. I'm not sure the best way to do this (maybe have the value type be enum StoredData which is like the current StorableType enum, but actually containing the data??)
Ideally the storage would work with Rust types directly. Our Rust types would implement ser/de still so that the user of our library could e.g. store them in some persistent storage.
This could be done a few ways: 1) As you pointed out, using an enum that associates the enum tag with the data:
enum StorableData {
BroadcastSet(HashMap<BroadcastIndex, Vec<u8>>),
}
One challenge is that Storage
is both used for main storage and local storage. So it needs to support many types of data and StorableTypes
. We only need the calling application to manage main storage right? So I suggest having to separate types: LocalStorage
and MainStorage
this would significantly reduce the types of data our trait needs to allow for.
After some further offline discussion, we decided that this library should not interact with persistent storage at all. The calling application will handle storage and retrieval. This allows the calling application to store tss-ecdsa outputs alongside any use-case-specific context. The library should now:
@amaloz I'm going to take over the remainder of this issue, which is writing up issues to get rid of all library-controlled persistent storage.
I made #203 #204 #205 and #213. Closing this as I think the plan is ready.
We need to allow the caller to control storage of public and secret material between instantiations of subprotocols. This includes:
Currently, these are maintained in the
main_storage
field of aParticipant
. They are stored and retrieved correctly throughout the protocol via the public API in the storage module.Requirements and comments
Completion criteria