Open dominique-unruh opened 1 year ago
Thanks. Bumping this, and making a note that it needs thought—it looks like keeping the fine-grained cloning behaviour (which we do want in general) without keeping the unsoundness might require leaking information about the proof up through the lemma, which is not ideal.
All this fine grained stuffs are for diamond cloning. Beside for algebraic structures, what are the other scenarios where we use diamond cloning?
Once type-classes are in main
, I am really in favor of simplifying & documenting the cloning behavior. It is currently to brittle.
Coming to this bug, a quick fix is to disable structural equivalence of modules when comparing modules.
@chdoc What about Dilithium? Do you use diamong cloning with modules?
We're using diamonds in crypto_box and possibly XMSS. In crypto_box, we can replace them with transfer lemmas (they were transfer lemmas to begin with, of the form "P holds on A, therefore P holds on A'" and discharged by sim
). In genera transfer lemmas are a viable but verbose way of dealing with the sound cases.
Perhaps we could generate and attempt to discharge transfer lemmas when coalescing equivalent modules?
@chdoc What about Dilithium? Do you use diamond cloning with modules?
If you mean theory cloning with theory A <- B
where A
and B
contain modules, I'm pretty sure we do.
I'm sure all of you realise this already, but it's not just diamond cloning and theory instantiation. It's any clone involving module instantiation.
A possible patchwork fix that would keep module instantiation working would be to prevent the coexistence of unifiable modules at the predicate, module, lemma and proof level. This would require tracking which modules are used recursively and emitting an error when two modules that are used recursively are unifiable. Maybe this is what you were referring to @fdupress ?
I said two things. Both of them are attempts at salvaging some module merging, and I'm starting to edge towards the "no, let's not do it" side.
The first one was that we could tag lemmas with disjointness facts that were used in a proof. This is both impractical and gross.
The second one was that we could replace the convertibility check with a check that all lemmas that refer to the module being instantiated also hold on the module it is being instantiated with. (And recursively with modules that call them directly.) In the sound cases, all proofs of that form should be easily discharged by transitivity and sim. But it will imply a relatively massive proof cost. There will also be quite a lot of engineering work to do to make sure that those proofs are carried out to justify the instantiation, but do not add actual lemmas to our environment.
I also don't know just yet if this works in the presence of sections: there could be local lemmas inside a section that serve to prove a non-local fact that does not itself refer to the module, but still stops holding upon instantiation.
Program variables have always been a bit special logically (and have always confused me a bit) because they are "unique objects". That is, when I declare two program variables in an identical way, EasyCrypt still knows that they are non-equal. This is something that cannot easily be reproduced in other logics. E.g., if in Coq, Lean, Isabelle/HOL, etc., I write the same declaration twice, there is no way to prove that the two declared objects are not equal. This is also why it might be difficult to do 1-to-1 ports of EasyCrypt developments into other provers, even if they have a powerful type-system that could mimic theory cloning and similar things.
I think that's also the reason for the problem here. Path-based uniqueness of variables does not play well with cloning and substitutions etc. And I think it's difficult to give a solution with a clear semantics.
Maybe one way (but definitely not a quick fix) would be to rethink the automatic variable distinctness. That is, by default, two variables are only known to be distinct if they have different names. (Here the name of A.B.C.x is x.) So if I write two modules EasyCrypt will not automatically know that they are different. But, as is already the case currently, we can use type annotations to make distinctness explicit. For example, module A : {-B} = {...}
would annotate A to have type {-B}
, i.e., no shared variables with B. (This is a shortened form of A : T{-B}
because in this specific case the type T
is not needed.) Now instantiating modules when cloning is harmless because we can only do it when the new module has the declared type of the old module.
Now, you may rightly say that this leads to the need for a lot of extra annotations. Maybe too much. This could be resolved as follows: In non-abstract theories (i.e., theories that are not abstract and not descendents of abstract theories), EasyCrypt internally generates disjointness annotations automatically when a new module is given. (Basically, when two modules have a different path through non-clones theories, then EC adds a disjointness into their types.) This does not even need to be explicitly printed, it will just be there. But it is in a sense syntactic sugar because we could rewrite everything to just have those disjointnesses declared explicitly; we just choose for convenience that EasyCrypt does it for us.
This would mean that unless we have modules inside abstract theories, all stays as before. And inside abstract theories, we can still use them if we add the disjointnesses we need explicitly.
Some additional notes:
@oskgo Yes, we have realized that this is not specific to diamond cloning. But this kind of module instantiation often arose in diamond clones.
@dominique-unruh What you propose is definitively not a quick fix :)
@chdoc Could you be more specific. For example, as the modules stateful?
@dominique-unruh It would suffice to not decide whether two program variables with the same name are distinct in the case where they are contained in unifiable modules, no? This would drastically reduce the annotation burden, and you could just add some noops to prevent unification instead of mentioning the other modules.
In essence you can then chose between having distinct program variables or having identical modules (for module instantiation), but can't have both.
@chdoc Could you be more specific. For example, as the modules stateful?
Here is an excerpt from the SimplifiedScheme
part:
clone IDS as DID with
type PK <= PK,
type SK <= SK,
type W <= commit_t,
type C <= challenge_t,
type Z <= response_t,
type Pstate <= pstate_t proof*.
clone import FSabort as FSa with
theory ID <= DID,
type M <= M,
op dC <= dC tau
proof* by smt(dC_ll dC_uni tau_bound).
This is where we instantiate the generic security proof for FSwA with the (types of) the concrete IDS underlying Dilithium.
So the theory being cloned separately and then instantiated is IDS
which contains the definition of ID schemes (module types, hence stateless) and the security notions (HVZK Sim and the HVZK game; necessarily stateful)
Not sure whether this can be (easily) refactored in such a way that it does not substitute (stateful) modules. :thinking:
@oskgo It might work, but I don't know if one can get a clear theoretical foundation for it. My expectation would be that it would be possible to write a paper describing the meaning of those things (theories, clones, modules), and then one can just check in any given situation whether a certain tactic or command is in line with this. Hence the idea of having under the hood always explicit disjointness annotations. And everything else is reduced to that, possibly in a user-friendly encapsulation (that still would need to be clearly specified). Other approaches would seem like hotfixes to me. We don't know whether it only resolves the specific counterexample I found, or whether it really is sound.
(An example of this is a bug with the unsoundness of glob
which I found about 10 years ago (can't find it because the old tracker isn't online anymore), and which was solved, and was found again a decade later (https://github.com/EasyCrypt/easycrypt/issues/102) only in a more complicated to exploit form.)
(Also, "not unifiable" may not be a very common condition: module A = {var x:int, proc f() ...}
and module B = {var x:int, proc g() ...}
are very different but unifiable by module AB = {var x:int, proc f() ..., proc g() ...}
. So to show non-unifiability, we'd need to have procedures with explicitly conflicting signatures, for example. Making the latter a requirement for instantiation would seem very confusing.)
@dominique-unruh Maybe unifiable was the wrong word. I meant whatever criterion is used when permitting module instantiation. The error messages use "Compatible" and "incompatible". As far as I can tell module instantiation only works if the modules are exactly the same at the AST level (except for the module name).
@oskgo You are right. I had assumed that you can instantiate a module with one that has additional procedures (but doesn't remove or change any). That would have seemed ok because it's a refinement. But a quick test shows that apparently this is not allowed. So my last comment can be disregarded.
It might work, but I don't know if one can get a clear theoretical foundation for it. My expectation would be that it would be possible to write a paper describing the meaning of those things (theories, clones, modules), and then one can just check in any given situation whether a certain tactic or command is in line with this. Hence the idea of having under the hood always explicit disjointness annotations. And everything else is reduced to that, possibly in a user-friendly encapsulation (that still would need to be clearly specified). Other approaches would seem like hotfixes to me. We don't know whether it only resolves the specific counterexample I found, or whether it really is sound.
@dominique-unruh What do you think is the underlying issue here if it isn't the fact that disjointness and instantiability are allowed at the same time? I don't see how limiting your solution to mutually instantiable modules makes it any more of a hotfix. I agree that a formalization would be nice to have but an idea of disjointness XOR instantiability should fit right in IMO.
@oskgo I don't have a reason why your idea might not work. Maybe it does. I can agree with the following statement: "I cannot see any way how this approach goes wrong." However, I feel that's not the right methodology. Because there are many features in EasyCrypt, they can interact in unexpected ways. So the only way to have a reliable system that I see is not to fix unsoundness when we encounter it but instead have some foundation from which we can understand why a certain thing is allowed.
In other words, for me, the bug does not come from the interaction of cloning and modules. The bug comes from the lack of specification what the things mean. The cloning and module interaction is merely a symptom.
Reading https://github.com/EasyCrypt/easycrypt/issues/380#issuecomment-1553739083, I'm beginning to think that removing module substitution (and by extension theory substitution for theories with modules) might be the less invasive thing to do.
If I understand correctly, the main use of this feature is to align different clones of the same theory, which is why we get away with requiring that the modules are exactly the same.
Do we have any idea of how widely this "feature" is used in the wild and how painful it would be to just not have module substitution and do manual "transitivity by sim" steps where needed? Should we create a branch where module substitution emits a warning, so that we can get an estimate of how big the impact would be?
I agree that a mathematical model of modules, theories, cloning, and the interaction with (p)(R)HL would be desirable. However, I also think that we should not wait until that materializes before removing this known unsoundness.
Not sure whether this can be (easily) refactored in such a way that it does not substitute (stateful) modules.
Having a separate (sub-)theory that contains only the types, then clone include
ing it should allow you to substitute only the signatures, but those don't actually need substituted because typing is structural. (Having them merged does avoid proliferation, though.)
Now the main issue you will face is the fact that you'll want to apply lemmas you have proved about A.M.f
to A'.M.f
instead. (Which is, incidentally, the bit that's unsound if we don't check everything works fine, which we don't.) This is where rewrite equiv
or simply transitivity
will come in handy, but not be entirely sufficient: if you use them naively, you'll end up having both A.M.f
and A'.M.f
appear in your adversarial restrictions in top-level lemmas, even though A.M.f
might be only a proof artefact.
We can deal with all of this without substitution (in the sound case anyway), but I don't think there will be a one-size-fits-all solution to propose and apply.
Do we have any idea of how widely this "feature" is used in the wild and how painful it would be to just not have module substitution and do manual "transitivity by sim" steps where needed?
The above should answer "how painful": it's like ripping a band aid off; it depends how hairy the proof is, and how deeply embedded the glue has become.
As for "how widely": most of our larger proofs pre-date the ability to merge identical cloned modules. XMSS, Dilithium and Kyber will need care; I know that the crypto_box proof goes through without because I rewrote it to make use of merging from an existing working proof (that doesn't make it painless, but that makes it less terrifying). I might have also gone for it when merging ChaCha-Poly (in examples). The standard library should be fine (because most of it pre-dates the ability to merge identical cloned modules and we are slow at embracing our own changes).
The above should answer "how painful": it's like ripping a band aid off; it depends how hairy the proof is, and how deeply embedded the glue has become.
:fearful:
The standard library should be fine (because most of it pre-dates the ability to merge identical cloned modules and we are slow at embracing our own changes).
Turns out, that may not be such a bad thing after all.
If someone creates a patch causing module M <- N
to emit a warning, I will apply it to the branch needed for Dilithium and see how much refactoring this entails.
When cloning while instantiating a subtheory of the cloned abstract theory, formerly distinct program variables can become the same variables. Since when reasoning about programs, distinctness and sameness are distinguishable, we get contradictions.
The following theory illustrates this problem (tested with 860dc3f):