Monotonic states and states with unique values

REPROSEC / dolev-yao-star-extrinsic

DY* with extrinsic proofs

https://reprosec.org/

Mozilla Public License 2.0

8 stars 0 forks source link

Monotonic states and states with unique values #9

Open TWal opened 2 months ago

TWal commented 2 months ago

During the previous meetings, @qaphla and @cwaldm have been proposing extending the state invariant to support both states that are updated monotonically, and states that contain a unique value (across all states).

Because this is a recurring discussion, I think it is good to keep track of ideas somewhere, hence this issue. Here are some ideas on how to implement it.

We could extend the state invariants with two other predicates:

a predicate that must be satisfied when a state is initialized
a predicate that must be satisfied when a state is updated

The second predicate already allows to encode monotonicity. I believe combining both predicates allow some flavor of enforcing values to be unique across states:

the "update" predicate may say that the unique value is unchanged when the state is updated
the "init" predicate may say that the unique value is a random that was generated directly before in the trace, as we do in NSL for example

With this, we may say that if two states have the same unique value, then they have been initialized at the same timestamp (directly after the unique value was generated), hence they have the same state id.

@qaphla and @cwaldm would this framework fit the usecases you had in mind?

A next question is: on the user side, how do we prove that a state has been updated monotonically? This needs a new kind of reasoning that has not be tackled before in DY*, that is we have to reason on the output of get_state. For example the user may need to prove that sending a message on the network do not affect the output of get_state, that setting another state id do not affect the output of get_state, etc.

I have not yet thought about that too much, I believe it is a deep rabbit-hole and it would be easy to be tempted to sparkle separation logic to solve it, but it is probably overkill :) . I have not yet thought too much about that however.

Somewhat related notes: after doing that we may think on doing cleanup, and a question is whether we need the current state invariant? We may want to fit it in the "update" predicate instead, and things would work the same? I think it is useful to keep both, because the state invariant is enforced to stay true when the trace grows. This is actually a long-standing TODO on whether we actually want to do this, so I did an experiment at tried to remove this restriction. Things are starting to become annoying when expressing the guarantees of get_state, we now cannot say that the output of get_state satisfy the state invariant with the current trace, we have to say that its output satisfy the state invariant with a prefix of the current trace. This existential has to be propagated in all the DY* code and protocol proofs, and because most state invariants are actually preserved when the trace grows, I think it is useful to keep it and remove the TODO.

cwaldm commented 1 month ago

I'm noting down the thoughts we had so far. They are not completely thought through. Just some ideas.

Invariants we want to express

Uniqueness Properties: for example, a code used in one session is not re-used in any other session (of the same participant) (Note: there could be more than one unique fields in a session -- for example, the user code and device code in the OAuth 2.0 Device Authorization Grant Flow)
Monotonicity Properties: for example, some value only increases within one session (Note: This includes the case of "set once", where a value is set only once (not neccessarily at initialization) in a session and never changed after that. For example, the nonce n_b in NSL is set at the initiator A when receiving the response from B and it is not changed after that.)

Structure of Invariants

Following the new structure of states (as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762), we were thinking of splitting the state invariant into 3 layers (for versions, sessions, and full states). For each of the layers we would provide a way to specify user defined invariants wich are then composed to build the overall (internal) state invariant.

The user defined invariants would be:

a _version_invariantuser relating fields of single versions (that's what we currently have as session state invariant)
a _session_invariantuser relating fields across versions of a session (this could be used for monotonicity properties, i.e., a counter ownly grows within one session)
a _full_state_invariantuser, relating several sessions of a participant (this can be used for uniqueness properties across sessions, i.e., a session id used in session A is not used in any other session)

The composition of those to the overall invariant would look like this:

let full_state_invariant state = full_state_invariant_user state /\ forall session in state. session_invariant session 

let session_invariant session = session_invariant_user session /\ forall version in session. version_invariant version

let version_invariant version = version_invariant_user version

Adding New Versions

When creating new versions one now needs to check all invariants from all levels. (Note that adding a new version (to a new or an existing session) is the only operation we have in the new state model https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762 , where we keep the whole history of sessions.) To simplify this, we could use some compatibility lemmas the user has to show. For example

version_invariant_user version /\ session_invariant_user session 
/\ version compatible_with session
==> session_invariant_user (add version to session)

, where version compatible_with session could for example check whether the new counter in the new version is indeed larger than the one in the current version of the session. (And a similar lemma for full state and session state compatibility.)

This is just a rough idea, but I think these compatibility predicates are somewhat related to your suggestion of the init and update predicates?

The User Side

I agree, that showing monotonicity for new versions requires to solve the problems you mention with reasoning about get_state. We didn't think about this yet.

Invariant for Growing Trace

Judging from your experiment, I agree with enforcing the state invariant for growing traces as we currently have it.

TWal commented 1 month ago

I see, your proposition is much more generic than what I proposed initially if I understand things well:

version_invariant is what the current state_invariant is and we would still require that it is preserved by the trace growing
session_invariant allows to express properties on all versions of a given session, in particular it is more generic than just monotonicity
full_state_invariant allows to express properties on all versions of all sessions stored by a principal, which is more generic than simple uniqueness properties

Maybe such generic predicates would work, but I am afraid that it would impose a lot of proof burden on the DY* users.

In the proposition I made above, I proposed to trade the global full_state_invariant with a local init_invariant which may be enough to prove uniqueness across states, as we do in NSL (the invariant, the injectivity property). Do you think this could be enough for the usecases you are thinking of, or do you really need the fully generic predicates?

qaphla commented 1 month ago

Yes, that's about what the three invariants would cover. The intent was that the framework would be able to be very general (indeed, as you say, this imposes a lot of proof burden on the users), but that we would provide standard ways of automating the common types of invariants.

The idea would be that, at the most basic level, if the user only needs local version_invariants, then there should be an interface that hides the remaining invariants for them.

Similarly, a user who has some involved version_invariant, but a standard session_invariant (e.g., some kind of monotonicity on fields) would be able to specify this when declaring the version type, perhaps via decorations of some sort, and the corresponding session_invariant could be generated from that information. It seems plausible that in this setting it would still be possible to have an interface that hides the unnecessary details from the user.

With that sort of structure, the maximal proof burden on the user is higher, but only up to the features that they actually need to use for a given protocol.

I think that the more restrictive system (as you suggest, possibly, or some other variant with similar constraints) would work for the use cases we currently have in mind (for now, at least, all of the session_invariants we have come up with are some form of monotonicity, and full_state_invariants all consist of some uniqueness-type properties). My only concern would be that it might be easier to start with the general system, rather than trying to fit it in later if it becomes necessary, but then this leads to an analysis of how likely it is to be needed vs. how much work it would take --- I don't have answers there offhand.

TWal commented 1 month ago

I see, it makes sense!

cwaldm commented 1 month ago

If we agree on the proposed generic invariant structure, I suggest the following high-level steps to proceed:

change internally used state layout as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762
adapt the state invariant to the new state structure (as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/9#issuecomment-2082755218 )
adapt State API functions to the new invariant structure (using/requiring invariants and proofs for them on all levels, i.e., the full_state_invariant)
simplified interface 1: add an interface for cases where only version invariants can be/need to be specified (that should behave like the current setting)
simplified interface 2: add an interface for common invariants (uniqueness, monotonicity) ? This is most unclear and needs to be discussed again.
automate generation of common invariants (uniqueness, set once, monotonicity) (as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/9#issuecomment-2085543766)

What do you think? Of course, details need to be added and discussed. But even on this high-level, are there steps missing, or things you would do differently?

TWal commented 1 month ago

I think this is a good roadmap, I just have a few comments:

I would prefer to stay with the "diff" version of state storage, if it doesn't make the proofs too ugly. I can help with that, e.g. I can write a function taking a trace and a principal that outputs the double-entry array of all versions of all states.
An open question is whether we can do the split predicate thing before the simplified interface, or do we have no choice but do it on the simplified interface (in which case this work would have to be duplicated)?

cwaldm commented 1 month ago

Sure, we can keep the diff state storing and use "translation functions" as you mentioned.

I hope, we can keep the split predicate method from the beginning. After a very brief look, I'd try to change session_pred to contain the three invariants for full state, session and version (instead of the single pred), and combine those in the apply_local_pred in the split_predicate_input_values instance ?

TWal commented 1 month ago

I think it would work for the session and version predicate:

the version predicate correspond to the current thing we have
for the session predicate, I guess we can write decode_tagged_data because all versions must use the same tag

However for full state I don't see how it fits with the split predicate method, I fear that we would need to write something custom for that?

cwaldm commented 1 month ago

I think, I now understand the problem with the full state: Currently, the split method relies on the tags on the data to distinguish which local predicate to call. However, right now, only versions have a tag (e.g., "NSL.Session" for an NSL version or "DY.Lib.State.PKI" for a PKI version).

If we now try to define (separate [1]) instances of split_predicate_input_values for versions, sessions and full states, we would need tags for sessions and full states.

As you say, we could maybe define the tag of a session to be the tag of any of its versions (making the restriction, that a session can only contain version with the same tag -- which is probably ok?).

But for full states we don't have a way to specify a tag (since the state may contain sessions with different tags, for example some NSL protocol session but also some PKI sessions).

On the other hand, I think one question is: what should the local full state predicates express? Looking from the protocol level: I'd probably only like to locally define predicates relating different protocol sessions. (For example, some identifier stored in protocol sessions should be unique across different protocol sessions). Are there use cases where we would need to define properties across different types of sessions? E.g., on the protocol level we want uniqueness of some field not only w.r.t. the protocol sessions but also for, say, PKI sessions (something like: "we never use a private key as identifier in the protocol session")?

If we restrict to the former, we implicitly have a tag again.

[1] separate instances of `split_predicate_input_values` for versions, sessions and full states ?

One option, reusing as much code as is currently there, is to define separate instance for each of the levels (versions, sessions and full states). The version instance is what we currently have. The instance for sessions would probably work with raw_data = trace & principal & session_id & session_bytes (where session_bytes is collection of version bytes, e.g., session_bytes = list bytes) and tagged_data = trace & principal & session_id & version_id & version_bytes. The decode_data would parse the version and return the tag and all other versions of the same session(_id). This instance needs to be connected to the underlying version predicate. Probably via apply_local where we call the local session pred on the raw data (as we do right now) and add forall version in the_session: local_version_pred version. And similarly for the instance for full states: Where raw_data = trace & principal & full_state_bytes (full_state_bytes is just a collection of sessions) and tagged_data as above; decode would return all sessions of the principal with the same tag as the parsed version -- assuming the restriction from above. In apply_local we would call the local full_state pred (on the collected sessions with the same tag) and add forall session in the_full_state: local_session_pred session.

It is not completely clear to me how this connection to the other instances would work, i.e., where to put the information about the underlying version/session predicate?

Another option is to change the local_bytes_state_predicate to contain separate version_pred, session_pred and full_state_pred (instead of just the one pred that we currently have), and probably corresponding later lemmas. (We would also keep the knowable portion which I would probably call version_pred_knowable). Then we have all of the predicates/invariants in one place. But then we have to heavily adapt split_predicate_input_values.

TWal commented 1 month ago

I think for state history it may work to check the tag of the first version of the state (as you are proposing), or maybe to check tags of all versions of the state history and abort if they are not consistent. However for the full state I don't think the split predicate module can tackle it, because we can't extract a tag from a full state. Maybe it can be done by hand for that purpose, or in another split-predicate module, where you would extract a full state to a list (tag & (list state_history)), and check each local predicate on the corresponding list state_history?

cwaldm commented 1 month ago

If we have to come up with something new for the full states anyway, I think, I prefer to change the state_predicate to include all of the invariants and adapt the split predicate method to these three invariants together. What do you think about this? Do you think it is feasible? This probably also makes it easier to refer to the underlying invariants (the call to the session/history invariant in the full state invariant and the call to the version/state invariant in the session/history invariant).

cwaldm commented 1 week ago

Update on the progress (https://github.com/REPROSEC/dolev-yao-star-extrinsic/compare/main...6db8c7771a6bba68238aa7c8e71848f0c6bd10cb):

I added a session_pred and full_state_pred to the state predicate: https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Invariant.fst#L50-L52 Those take as argument the current session and full state and check when a new state is allowed to be added (to a session in a full state of a principal). [I'm using the terms "state", "session", and "full state" for now.] From those, we build a global_state_pred [i know, it's not a good name considering the split predicate method] https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Invariant.fst#L106-L112

The lemmas for the trace manipulation functions are adapted to this new state predicate. For example, https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Manipulation.fst#L443-L457 But since we keep the later lemma for the single state/version pred, we still have that pred holds on the whole trace (and not just on the prefix from above). With this, the NSL example doesn't have to change.

Important note: all of the changes so far are on the lowest DY level, there are no adaptions to the split predicate yet.

I added a mini example https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/cwaldm/experimental/examples/invariants/DY.Example.Invariants.fst, where we have as a session invariant, that a counter must be increasing. The example contains one step, in which we read a state, set a new state, send messages and trigger an event, and the corresponding lemma, that the trace invariant is maintained.

Feel free to take a look at the current state and comment!

Next steps are:

add a full state invariant to the mini example
adapt the split predicate method (at least for session preds)
simplified interface for only single state preds

TWal commented 1 week ago

Cool! It looks like interesting work :)

If I understand everything, the methodology is the following:

if events between tr1 and tr2 only contain non-state-setting events (or state setting events for other states), prove no_set_state_entry_for p sid (suffix_after tr2 tr1) using some lemmas (e.g. no_set_state_entry_for_suffixes_transitive, send_msg_sets_no_state, etc)
using get_session_aux_same, we know that the output of get_session with p, sid, tr1 is the same with p, sid, tr2
thanks to get_state_is_last_of_get_session, we can prove the global_state_pred for the next set_state call

Is that correct?

Some remarks about that:

About the SMT patterns of ..._sets_no_state that instantiate a forall in the post-condition of the lemma, I fear that it would lead to a lot of triggering and make the SMT overwhelmed when the functions grows. But I think we can refactor this to have tighter SMT patterns so this should be fixable
On the other hand, I think there should be SMT patterns for some lemmas that we don't want to instantiate by hand, e.g. no_set_state_entry_for_suffixes_transitive or get_session_aux_same, but this also looks fixable
This methodology reminds me of the modifies theory in Low* (link to old tutorial), I think it would be interesting to chat with Low* experts and see if we can benefit from their experience, to see how it scales

On a higher level, I was wondering what additional security properties this would allow us to prove? Although this certainly looks useful as a sanity check, I don't yet see what new security could be proved using this mechanism.

REPROSEC / dolev-yao-star-extrinsic

Monotonic states and states with unique values #9

Invariants we want to express

Structure of Invariants

Adding New Versions

The User Side

Invariant for Growing Trace

[1] separate instances of split_predicate_input_values for versions, sessions and full states ?

[1] separate instances of `split_predicate_input_values` for versions, sessions and full states ?