Open TWal opened 2 months ago
I'm noting down the thoughts we had so far. They are not completely thought through. Just some ideas.
Following the new structure of states (as proposed in https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762), we were thinking of splitting the state invariant into 3 layers (for versions, sessions, and full states). For each of the layers we would provide a way to specify user defined invariants wich are then composed to build the overall (internal) state invariant.
The user defined invariants would be:
The composition of those to the overall invariant would look like this:
let full_state_invariant state = full_state_invariant_user state /\ forall session in state. session_invariant session
let session_invariant session = session_invariant_user session /\ forall version in session. version_invariant version
let version_invariant version = version_invariant_user version
When creating new versions one now needs to check all invariants from all levels. (Note that adding a new version (to a new or an existing session) is the only operation we have in the new state model https://github.com/REPROSEC/dolev-yao-star-extrinsic/issues/6#issuecomment-2077239762 , where we keep the whole history of sessions.) To simplify this, we could use some compatibility lemmas the user has to show. For example
version_invariant_user version /\ session_invariant_user session
/\ version compatible_with session
==> session_invariant_user (add version to session)
, where version compatible_with session
could for example check whether the new counter in the new version is indeed larger than the one in the current version of the session.
(And a similar lemma for full state and session state compatibility.)
This is just a rough idea, but I think these compatibility predicates are somewhat related to your suggestion of the init and update predicates?
I agree, that showing monotonicity for new versions requires to solve the problems you mention with reasoning about get_state
.
We didn't think about this yet.
Judging from your experiment, I agree with enforcing the state invariant for growing traces as we currently have it.
I see, your proposition is much more generic than what I proposed initially if I understand things well:
version_invariant
is what the current state_invariant
is and we would still require that it is preserved by the trace growingsession_invariant
allows to express properties on all versions of a given session, in particular it is more generic than just monotonicityfull_state_invariant
allows to express properties on all versions of all sessions stored by a principal, which is more generic than simple uniqueness propertiesMaybe such generic predicates would work, but I am afraid that it would impose a lot of proof burden on the DY* users.
In the proposition I made above, I proposed to trade the global full_state_invariant
with a local init_invariant
which may be enough to prove uniqueness across states, as we do in NSL (the invariant, the injectivity property).
Do you think this could be enough for the usecases you are thinking of, or do you really need the fully generic predicates?
Yes, that's about what the three invariants would cover. The intent was that the framework would be able to be very general (indeed, as you say, this imposes a lot of proof burden on the users), but that we would provide standard ways of automating the common types of invariants.
The idea would be that, at the most basic level, if the user only needs local version_invariant
s, then there should be an interface that hides the remaining invariants for them.
Similarly, a user who has some involved version_invariant
, but a standard session_invariant
(e.g., some kind of monotonicity on fields) would be able to specify this when declaring the version
type, perhaps via decorations of some sort, and the corresponding session_invariant
could be generated from that information. It seems plausible that in this setting it would still be possible to have an interface that hides the unnecessary details from the user.
With that sort of structure, the maximal proof burden on the user is higher, but only up to the features that they actually need to use for a given protocol.
I think that the more restrictive system (as you suggest, possibly, or some other variant with similar constraints) would work for the use cases we currently have in mind (for now, at least, all of the session_invariant
s we have come up with are some form of monotonicity, and full_state_invariant
s all consist of some uniqueness-type properties). My only concern would be that it might be easier to start with the general system, rather than trying to fit it in later if it becomes necessary, but then this leads to an analysis of how likely it is to be needed vs. how much work it would take --- I don't have answers there offhand.
I see, it makes sense!
If we agree on the proposed generic invariant structure, I suggest the following high-level steps to proceed:
What do you think? Of course, details need to be added and discussed. But even on this high-level, are there steps missing, or things you would do differently?
I think this is a good roadmap, I just have a few comments:
Sure, we can keep the diff state storing and use "translation functions" as you mentioned.
I hope, we can keep the split predicate method from the beginning.
After a very brief look, I'd try to change session_pred
to contain the three invariants for full state, session and version (instead of the single pred
), and combine those in the apply_local_pred
in the split_predicate_input_values instance ?
I think it would work for the session and version predicate:
decode_tagged_data
because all versions must use the same tagHowever for full state I don't see how it fits with the split predicate method, I fear that we would need to write something custom for that?
I think, I now understand the problem with the full state: Currently, the split method relies on the tags on the data to distinguish which local predicate to call. However, right now, only versions have a tag (e.g., "NSL.Session" for an NSL version or "DY.Lib.State.PKI" for a PKI version).
If we now try to define (separate [1]) instances of split_predicate_input_values
for versions, sessions and full states, we would need tags for sessions and full states.
As you say, we could maybe define the tag of a session to be the tag of any of its versions (making the restriction, that a session can only contain version with the same tag -- which is probably ok?).
But for full states we don't have a way to specify a tag (since the state may contain sessions with different tags, for example some NSL protocol session but also some PKI sessions).
On the other hand, I think one question is: what should the local full state predicates express? Looking from the protocol level: I'd probably only like to locally define predicates relating different protocol sessions. (For example, some identifier stored in protocol sessions should be unique across different protocol sessions). Are there use cases where we would need to define properties across different types of sessions? E.g., on the protocol level we want uniqueness of some field not only w.r.t. the protocol sessions but also for, say, PKI sessions (something like: "we never use a private key as identifier in the protocol session")?
If we restrict to the former, we implicitly have a tag again.
split_predicate_input_values
for versions, sessions and full states ?One option, reusing as much code as is currently there, is to define separate instance for each of the levels (versions, sessions and full states).
The version instance is what we currently have.
The instance for sessions would probably work with raw_data = trace & principal & session_id & session_bytes
(where session_bytes
is collection of version bytes, e.g., session_bytes = list bytes
) and tagged_data = trace & principal & session_id & version_id & version_bytes
. The decode_data
would parse the version and return the tag and all other versions of the same session(_id).
This instance needs to be connected to the underlying version predicate.
Probably via apply_local
where we call the local session pred on the raw data (as we do right now) and add forall version in the_session: local_version_pred version
.
And similarly for the instance for full states:
Where raw_data = trace & principal & full_state_bytes
(full_state_bytes
is just a collection of sessions) and tagged_data
as above; decode
would return all sessions of the principal with the same tag as the parsed version -- assuming the restriction from above.
In apply_local
we would call the local full_state pred (on the collected sessions with the same tag) and add forall session in the_full_state: local_session_pred session
.
It is not completely clear to me how this connection to the other instances would work, i.e., where to put the information about the underlying version/session predicate?
Another option is to change the local_bytes_state_predicate
to contain separate version_pred
, session_pred
and full_state_pred
(instead of just the one pred
that we currently have), and probably corresponding later
lemmas. (We would also keep the knowable
portion which I would probably call version_pred_knowable
).
Then we have all of the predicates/invariants in one place. But then we have to heavily adapt split_predicate_input_values
.
I think for state history it may work to check the tag of the first version of the state (as you are proposing), or maybe to check tags of all versions of the state history and abort if they are not consistent. However for the full state I don't think the split predicate module can tackle it, because we can't extract a tag from a full state. Maybe it can be done by hand for that purpose, or in another split-predicate module, where you would extract a full state to a list (tag & (list state_history))
, and check each local predicate on the corresponding list state_history
?
If we have to come up with something new for the full states anyway, I think, I prefer to change the state_predicate
to include all of the invariants and adapt the split predicate method to these three invariants together. What do you think about this? Do you think it is feasible?
This probably also makes it easier to refer to the underlying invariants (the call to the session/history invariant in the full state invariant and the call to the version/state invariant in the session/history invariant).
Update on the progress (https://github.com/REPROSEC/dolev-yao-star-extrinsic/compare/main...6db8c7771a6bba68238aa7c8e71848f0c6bd10cb):
I added a session_pred
and full_state_pred
to the state predicate: https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Invariant.fst#L50-L52
Those take as argument the current session and full state and check when a new state is allowed to be added (to a session in a full state of a principal).
[I'm using the terms "state", "session", and "full state" for now.]
From those, we build a global_state_pred
[i know, it's not a good name considering the split predicate method]
https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Invariant.fst#L106-L112
The lemmas for the trace manipulation functions are adapted to this new state predicate.
For example, https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/6db8c7771a6bba68238aa7c8e71848f0c6bd10cb/src/core/DY.Core.Trace.Manipulation.fst#L443-L457
But since we keep the later
lemma for the single state/version pred, we still have that pred
holds on the whole trace (and not just on the prefix from above).
With this, the NSL example doesn't have to change.
Important note: all of the changes so far are on the lowest DY level, there are no adaptions to the split predicate yet.
I added a mini example https://github.com/REPROSEC/dolev-yao-star-extrinsic/blob/cwaldm/experimental/examples/invariants/DY.Example.Invariants.fst, where we have as a session invariant, that a counter must be increasing. The example contains one step, in which we read a state, set a new state, send messages and trigger an event, and the corresponding lemma, that the trace invariant is maintained.
Feel free to take a look at the current state and comment!
Next steps are:
Cool! It looks like interesting work :)
If I understand everything, the methodology is the following:
no_set_state_entry_for p sid (suffix_after tr2 tr1)
using some lemmas (e.g. no_set_state_entry_for_suffixes_transitive
, send_msg_sets_no_state
, etc)get_session_aux_same
, we know that the output of get_session
with p, sid, tr1 is the same with p, sid, tr2get_state_is_last_of_get_session
, we can prove the global_state_pred
for the next set_state
callIs that correct?
Some remarks about that:
..._sets_no_state
that instantiate a forall
in the post-condition of the lemma, I fear that it would lead to a lot of triggering and make the SMT overwhelmed when the functions grows. But I think we can refactor this to have tighter SMT patterns so this should be fixableno_set_state_entry_for_suffixes_transitive
or get_session_aux_same
, but this also looks fixablemodifies
theory in Low* (link to old tutorial), I think it would be interesting to chat with Low* experts and see if we can benefit from their experience, to see how it scalesOn a higher level, I was wondering what additional security properties this would allow us to prove? Although this certainly looks useful as a sanity check, I don't yet see what new security could be proved using this mechanism.
During the previous meetings, @qaphla and @cwaldm have been proposing extending the state invariant to support both states that are updated monotonically, and states that contain a unique value (across all states).
Because this is a recurring discussion, I think it is good to keep track of ideas somewhere, hence this issue. Here are some ideas on how to implement it.
We could extend the state invariants with two other predicates:
The second predicate already allows to encode monotonicity. I believe combining both predicates allow some flavor of enforcing values to be unique across states:
With this, we may say that if two states have the same unique value, then they have been initialized at the same timestamp (directly after the unique value was generated), hence they have the same state id.
@qaphla and @cwaldm would this framework fit the usecases you had in mind?
A next question is: on the user side, how do we prove that a state has been updated monotonically? This needs a new kind of reasoning that has not be tackled before in DY*, that is we have to reason on the output of
get_state
. For example the user may need to prove that sending a message on the network do not affect the output ofget_state
, that setting another state id do not affect the output ofget_state
, etc.I have not yet thought about that too much, I believe it is a deep rabbit-hole and it would be easy to be tempted to sparkle separation logic to solve it, but it is probably overkill :) . I have not yet thought too much about that however.
Somewhat related notes: after doing that we may think on doing cleanup, and a question is whether we need the current state invariant? We may want to fit it in the "update" predicate instead, and things would work the same? I think it is useful to keep both, because the state invariant is enforced to stay true when the trace grows. This is actually a long-standing TODO on whether we actually want to do this, so I did an experiment at tried to remove this restriction. Things are starting to become annoying when expressing the guarantees of
get_state
, we now cannot say that the output ofget_state
satisfy the state invariant with the current trace, we have to say that its output satisfy the state invariant with a prefix of the current trace. This existential has to be propagated in all the DY* code and protocol proofs, and because most state invariants are actually preserved when the trace grows, I think it is useful to keep it and remove the TODO.