REPROSEC / dolev-yao-star-extrinsic

DY* with extrinsic proofs
https://reprosec.org/
Mozilla Public License 2.0
8 stars 0 forks source link

Monotonic states and states with unique values #9

Open TWal opened 2 months ago

TWal commented 2 months ago

During the previous meetings, @qaphla and @cwaldm have been proposing extending the state invariant to support both states that are updated monotonically, and states that contain a unique value (across all states).

Because this is a recurring discussion, I think it is good to keep track of ideas somewhere, hence this issue. Here are some ideas on how to implement it.

We could extend the state invariants with two other predicates:

The second predicate already allows to encode monotonicity. I believe combining both predicates allow some flavor of enforcing values to be unique across states:

With this, we may say that if two states have the same unique value, then they have been initialized at the same timestamp (directly after the unique value was generated), hence they have the same state id.

@qaphla and @cwaldm would this framework fit the usecases you had in mind?

A next question is: on the user side, how do we prove that a state has been updated monotonically? This needs a new kind of reasoning that has not be tackled before in DY*, that is we have to reason on the output of get_state. For example the user may need to prove that sending a message on the network do not affect the output of get_state, that setting another state id do not affect the output of get_state, etc.

I have not yet thought about that too much, I believe it is a deep rabbit-hole and it would be easy to be tempted to sparkle separation logic to solve it, but it is probably overkill :) . I have not yet thought too much about that however.

Somewhat related notes: after doing that we may think on doing cleanup, and a question is whether we need the current state invariant? We may want to fit it in the "update" predicate instead, and things would work the same? I think it is useful to keep both, because the state invariant is enforced to stay true when the trace grows. This is actually a long-standing TODO on whether we actually want to do this, so I did an experiment at tried to remove this restriction. Things are starting to become annoying when expressing the guarantees of get_state, we now cannot say that the output of get_state satisfy the state invariant with the current trace, we have to say that its output satisfy the state invariant with a prefix of the current trace. This existential has to be propagated in all the DY* code and protocol proofs, and because most state invariants are actually preserved when the trace grows, I think it is useful to keep it and remove the TODO.

cwaldm commented 1 month ago

I'm noting down the thoughts we had so far. They are not completely thought through. Just some ideas.

Invariants we want to express

Structure of Invariants

Following the new structure of states (as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762), we were thinking of splitting the state invariant into 3 layers (for versions, sessions, and full states). For each of the layers we would provide a way to specify user defined invariants wich are then composed to build the overall (internal) state invariant.

The user defined invariants would be:

The composition of those to the overall invariant would look like this:

let full_state_invariant state = full_state_invariant_user state /\ forall session in state. session_invariant session 

let session_invariant session = session_invariant_user session /\ forall version in session. version_invariant version

let version_invariant version = version_invariant_user version

Adding New Versions

When creating new versions one now needs to check all invariants from all levels. (Note that adding a new version (to a new or an existing session) is the only operation we have in the new state model https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762 , where we keep the whole history of sessions.) To simplify this, we could use some compatibility lemmas the user has to show. For example

version_invariant_user version /\ session_invariant_user session 
/\ version compatible_with session
==> session_invariant_user (add version to session)

, where version compatible_with session could for example check whether the new counter in the new version is indeed larger than the one in the current version of the session. (And a similar lemma for full state and session state compatibility.)

This is just a rough idea, but I think these compatibility predicates are somewhat related to your suggestion of the init and update predicates?

The User Side

I agree, that showing monotonicity for new versions requires to solve the problems you mention with reasoning about get_state. We didn't think about this yet.

Invariant for Growing Trace

Judging from your experiment, I agree with enforcing the state invariant for growing traces as we currently have it.

TWal commented 1 month ago

I see, your proposition is much more generic than what I proposed initially if I understand things well:

Maybe such generic predicates would work, but I am afraid that it would impose a lot of proof burden on the DY* users.

In the proposition I made above, I proposed to trade the global full_state_invariant with a local init_invariant which may be enough to prove uniqueness across states, as we do in NSL (the invariant, the injectivity property). Do you think this could be enough for the usecases you are thinking of, or do you really need the fully generic predicates?

qaphla commented 1 month ago

Yes, that's about what the three invariants would cover. The intent was that the framework would be able to be very general (indeed, as you say, this imposes a lot of proof burden on the users), but that we would provide standard ways of automating the common types of invariants.

The idea would be that, at the most basic level, if the user only needs local version_invariants, then there should be an interface that hides the remaining invariants for them.

Similarly, a user who has some involved version_invariant, but a standard session_invariant (e.g., some kind of monotonicity on fields) would be able to specify this when declaring the version type, perhaps via decorations of some sort, and the corresponding session_invariant could be generated from that information. It seems plausible that in this setting it would still be possible to have an interface that hides the unnecessary details from the user.

With that sort of structure, the maximal proof burden on the user is higher, but only up to the features that they actually need to use for a given protocol.

I think that the more restrictive system (as you suggest, possibly, or some other variant with similar constraints) would work for the use cases we currently have in mind (for now, at least, all of the session_invariants we have come up with are some form of monotonicity, and full_state_invariants all consist of some uniqueness-type properties). My only concern would be that it might be easier to start with the general system, rather than trying to fit it in later if it becomes necessary, but then this leads to an analysis of how likely it is to be needed vs. how much work it would take --- I don't have answers there offhand.

TWal commented 1 month ago

I see, it makes sense!

cwaldm commented 1 month ago

If we agree on the proposed generic invariant structure, I suggest the following high-level steps to proceed:

  1. change internally used state layout as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762
  2. adapt the state invariant to the new state structure (as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/9#issuecomment-2082755218 )
  3. adapt State API functions to the new invariant structure (using/requiring invariants and proofs for them on all levels, i.e., the full_state_invariant)
  4. simplified interface 1: add an interface for cases where only version invariants can be/need to be specified (that should behave like the current setting)
  5. simplified interface 2: add an interface for common invariants (uniqueness, monotonicity) ? This is most unclear and needs to be discussed again.
  6. automate generation of common invariants (uniqueness, set once, monotonicity) (as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/9#issuecomment-2085543766)

What do you think? Of course, details need to be added and discussed. But even on this high-level, are there steps missing, or things you would do differently?

TWal commented 1 month ago

I think this is a good roadmap, I just have a few comments:

cwaldm commented 1 month ago

Sure, we can keep the diff state storing and use "translation functions" as you mentioned.

I hope, we can keep the split predicate method from the beginning. After a very brief look, I'd try to change session_pred to contain the three invariants for full state, session and version (instead of the single pred), and combine those in the apply_local_pred in the split_predicate_input_values instance ?

TWal commented 1 month ago

I think it would work for the session and version predicate:

However for full state I don't see how it fits with the split predicate method, I fear that we would need to write something custom for that?

cwaldm commented 1 month ago

I think, I now understand the problem with the full state: Currently, the split method relies on the tags on the data to distinguish which local predicate to call. However, right now, only versions have a tag (e.g., "NSL.Session" for an NSL version or "DY.Lib.State.PKI" for a PKI version).

If we now try to define (separate [1]) instances of split_predicate_input_values for versions, sessions and full states, we would need tags for sessions and full states.

As you say, we could maybe define the tag of a session to be the tag of any of its versions (making the restriction, that a session can only contain version with the same tag -- which is probably ok?).

But for full states we don't have a way to specify a tag (since the state may contain sessions with different tags, for example some NSL protocol session but also some PKI sessions).

On the other hand, I think one question is: what should the local full state predicates express? Looking from the protocol level: I'd probably only like to locally define predicates relating different protocol sessions. (For example, some identifier stored in protocol sessions should be unique across different protocol sessions). Are there use cases where we would need to define properties across different types of sessions? E.g., on the protocol level we want uniqueness of some field not only w.r.t. the protocol sessions but also for, say, PKI sessions (something like: "we never use a private key as identifier in the protocol session")?

If we restrict to the former, we implicitly have a tag again.


[1] separate instances of split_predicate_input_values for versions, sessions and full states ?

One option, reusing as much code as is currently there, is to define separate instance for each of the levels (versions, sessions and full states). The version instance is what we currently have. The instance for sessions would probably work with raw_data = trace & principal & session_id & session_bytes (where session_bytes is collection of version bytes, e.g., session_bytes = list bytes) and tagged_data = trace & principal & session_id & version_id & version_bytes. The decode_data would parse the version and return the tag and all other versions of the same session(_id). This instance needs to be connected to the underlying version predicate. Probably via apply_local where we call the local session pred on the raw data (as we do right now) and add forall version in the_session: local_version_pred version. And similarly for the instance for full states: Where raw_data = trace & principal & full_state_bytes (full_state_bytes is just a collection of sessions) and tagged_data as above; decode would return all sessions of the principal with the same tag as the parsed version -- assuming the restriction from above. In apply_local we would call the local full_state pred (on the collected sessions with the same tag) and add forall session in the_full_state: local_session_pred session.

It is not completely clear to me how this connection to the other instances would work, i.e., where to put the information about the underlying version/session predicate?

Another option is to change the local_bytes_state_predicate to contain separate version_pred, session_pred and full_state_pred (instead of just the one pred that we currently have), and probably corresponding later lemmas. (We would also keep the knowable portion which I would probably call version_pred_knowable). Then we have all of the predicates/invariants in one place. But then we have to heavily adapt split_predicate_input_values.

TWal commented 1 month ago

I think for state history it may work to check the tag of the first version of the state (as you are proposing), or maybe to check tags of all versions of the state history and abort if they are not consistent. However for the full state I don't think the split predicate module can tackle it, because we can't extract a tag from a full state. Maybe it can be done by hand for that purpose, or in another split-predicate module, where you would extract a full state to a list (tag & (list state_history)), and check each local predicate on the corresponding list state_history?

cwaldm commented 1 month ago

If we have to come up with something new for the full states anyway, I think, I prefer to change the state_predicate to include all of the invariants and adapt the split predicate method to these three invariants together. What do you think about this? Do you think it is feasible? This probably also makes it easier to refer to the underlying invariants (the call to the session/history invariant in the full state invariant and the call to the version/state invariant in the session/history invariant).

cwaldm commented 1 week ago

Update on the progress (https://github.com/REPROSEC/dolev-yao-star-extrinsic/compare/main...6db8c7771a6bba68238aa7c8e71848f0c6bd10cb):

I added a session_pred and full_state_pred to the state predicate: https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Invariant.fst#L50-L52 Those take as argument the current session and full state and check when a new state is allowed to be added (to a session in a full state of a principal). [I'm using the terms "state", "session", and "full state" for now.] From those, we build a global_state_pred [i know, it's not a good name considering the split predicate method] https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Invariant.fst#L106-L112

The lemmas for the trace manipulation functions are adapted to this new state predicate. For example, https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Manipulation.fst#L443-L457 But since we keep the later lemma for the single state/version pred, we still have that pred holds on the whole trace (and not just on the prefix from above). With this, the NSL example doesn't have to change.

Important note: all of the changes so far are on the lowest DY level, there are no adaptions to the split predicate yet.

I added a mini example https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/cwaldm/experimental/examples/invariants/DY.Example.Invariants.fst, where we have as a session invariant, that a counter must be increasing. The example contains one step, in which we read a state, set a new state, send messages and trigger an event, and the corresponding lemma, that the trace invariant is maintained.

Feel free to take a look at the current state and comment!

Next steps are:

TWal commented 1 week ago

Cool! It looks like interesting work :)

If I understand everything, the methodology is the following:

  1. if events between tr1 and tr2 only contain non-state-setting events (or state setting events for other states), prove no_set_state_entry_for p sid (suffix_after tr2 tr1) using some lemmas (e.g. no_set_state_entry_for_suffixes_transitive, send_msg_sets_no_state, etc)
  2. using get_session_aux_same, we know that the output of get_session with p, sid, tr1 is the same with p, sid, tr2
  3. thanks to get_state_is_last_of_get_session, we can prove the global_state_pred for the next set_state call

Is that correct?

Some remarks about that:

On a higher level, I was wondering what additional security properties this would allow us to prove? Although this certainly looks useful as a sanity check, I don't yet see what new security could be proved using this mechanism.