REPROSEC / dolev-yao-star-extrinsic

DY* with extrinsic proofs
https://reprosec.org/
Mozilla Public License 2.0
8 stars 0 forks source link

Some cleanup on states/session/version and types for session id and timestamp #32

Closed cwaldm closed 1 month ago

cwaldm commented 1 month ago

Fixing https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6: renamed occurrences of "state" and "session" to "version" and added a comment explaining the state model. There are no actual changes to the implementation. In particular, there is no new version id. The current version is just the latest SetVersion entry on the trace. I think, that matches what we previously had: There could have been several SetState entries with the same session id and only the last one was relevant. Predicates/invariants are still called "state_predicate", since we are about to change some of those to support invariants for sessions and full states in the context of https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/9. One thing left to discuss is corruption/attacker knowledge. Currently, we only corrupt whole sessions. With this, corrupted_state_is_publishable says that any content stored under a corrupted session id is publishable. We have to discuss, whether this is what we want. This is related to https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/21 and should be resolved there.

Partially addressing https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/30: I introduced types for session id and timestamps (and used those in the functions). The session_id type should be hidden behind an interface file. The timestamp type maybe not.

TWal commented 1 month ago

Thanks, it's certainly more clean to have type alias for timestamps and state identifiers. As it is done here, it is mostly cosmetic as it is still equivalent to a nat for F*'s type system. We could hide that away by defining a new type, e.g. type state_id = { the_id: nat; }.

About the session / state thing however, I'm not sure this is going to the right direction, for example I find it a bit weird to name things set_version, if someone is not familiar with DY* codebase they would have no idea what this means (set version of what?), so I would rather keep it as set_state as the name conveys more clearly the behavior of the function.

I think we should have agreed more explicitly on the change of naming convention before diving into the code.

Here is the nomenclature I had in mind, which can be debated, but let's settle on this before doing the big refactor:

Also many of these names are only relevant for proofs (e.g. "full state" or "state history") therefore I think these names shouldn't impact the names that are used when writing specifications (such as set_state).

What do you think about it?

cwaldm commented 1 month ago

I reverted the state/session changes, so that we are left with the type aliases for session id and timestamp. Those were just meant as a preparation step for now (I know that they don't enforce anything in their current form). But I'll add the hidden type definition for session_id as you suggested (if you prefer this over using interfaces? (I do)). Then we have a clean state of the branch that could be merged.

We can then continue the state discussion in the corresponding issue https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6 and create a new branch for that later. (I can quote your thoughts on the state at that issue and reply over there.) Does that make sense? Or do you prefer, resolving it within this PR?

cwaldm commented 1 month ago

Regarding the names for the states: If I understand things correctly, the translation dictonary between the two suggestions is:

@cwaldm @TWal
the matrix full state full state
one column session state history
index of the column session_id state_id
one cell version state

I would like to have a speaking name for the column of the matrix for the following reasons:

  1. The index of a column should have a meaningful name. I think, state_id from @TWal suggestion is confusing. Since a state is one cell of the matrix, I would expect state_id to identify the state, i.e., the cell. But here state_id is referring to only one of the indices, related to the history. It seems that "state" is used with two meanings (referring to a single cell and the whole history) which could be confsuing (although it is of course close to how we thinkg of states...). Unfortunately, history_id doesn't convey the right intuition.

  2. The operation "add a new column to the matrix (containing one entry)" should have a meaningful name. Currently, we have set_state which refers to "add a new entry to a(n existing) column". This is fine (the intuition given by the name is correct). But how should the first operation (using my terms: inizialize a new session with the following version) be called? I think, it helps for reading specifications, if we make the distinction between "adding to an existing column" and "creating a new column" explicit in the function names. Again, set_history is not really the right thing (and other options such as add_state, add_new_state or set_new_state are even more confusing).

Maybe a compromise could be

the matrix full state
one column session
index of the column session_id
one cell state

Then single cells are states (close to intuition), a session consists of states, and we can use session_id, set_state and set_session all giving the right intuition (once you know what session and state mean in DY*, but I don't think we can avoid that part).

This suggestion already includes the following thought: I don't think, we need to distinguish "protocol state-machine" states and other (for example, PKI) states. I see that there is a difference: the entries in a protocol column are ordered (the chronology of the state machine), whereas the entries in a PKI column are not (just a collection of keys). So either we want to handle the two (ordered vs unordered collection) differently (internally), then we should define different types and give separate names, or we don't care internally about ordered or unordered (as we currently do), but then we should not use different names.

What do you think?

TWal commented 1 month ago

I think there are two viewpoints to consider: what we see when writing the code, and what we see when doing the proofs. The viewpoint of the proofs is what you are explaining (the matrix of all states and all their versions). The viewpoint of the code is that we have a list of states, we can either expand the list with a new state, or update one of the states in the list. We only consider one version for each state in this list, namely, the latest one.

I think the new names should make sense in both viewpoints: one of the design goals that led to DY*-extrinsic is to have readable specification (free of any proof artifact), so the names used in the specifications should make sense for someone that is ignoring all the proof-related terminology.

In that regard, from the viewpoint of code, I think using the names "state" and "state identifier" is self-explanatory. Probably that the combination of the functions new_session_id and set_state is a bad design choice, and we would better use a combo of new_state (that would do new_session_id + set_state) and update_state (that would do set_state).

When considering the viewpoint of proofs, the list becomes a matrix because we consider all past versions of the states. I think it still makes sense to call "state" each cell of the matrix, because each cell, at some point in time, was in the list of states (from the viewpoint of code), therefore was the "state" at that time. In the world of proofs, state identifiers now point to a list of all versions of a given state, but I don't see it as a problem: it is the list of all versions of the state that were pointed by this state identifier at some point in time.

I don't understand why the list of all versions of a state would be called a "session". I have been trying to find definitions of what "session" means in computer science, and wikipedia seems to agree with the intuition that a session corresponds to one execution of a protocol (for a large notion of "protocol"). I couldn't find any definition supporting the intuition that it would correspond to all versions of a given state.