Open mn200 opened 5 years ago
To make progress on this issue, I think we will need to have a discussion on what miscTheory
is for and related design decisions.
I do not agree with the issues as it stands, i.e., I think miscTheory
should be allowed to define constants, and furthermore, I think all of HOL should be assumed to be in scope throughout CakeML.
But I also think miscTheory
should be upstreamed out of existence, so any rules about it should make sense given that transience.
I don't know where you think all of misc
is going to be upstreamed to, because there are theorems in it that are never going to make it to HOL
Developing with all of HOL in scope is a major pain and a maintainability disaster. If I understand you right, you'd advocate for set_grammar_ancestry ["misc", ...]
(bringing misc
into scope grammatically). If you do this, you can't use the variable name last
(for example) because that's defined in path
, even if, conceptually, your script has nothing to do with paths. A change to HOL's theory of bags can break a script file that never mentions or uses bags.
Are you really in favour of making every developer of every script file aware of every change to every theory in HOL, and to have them change their variable names when new constants are added to theories about which they have never needed to be concerned?
I don't want CakeML developers to have to worry about changes to HOL (or other) theories that have no direct bearing on the theory they are working on. We agree on that goal.
I believe there's little (though admittedly non-zero) harm in having theories open. The harm, as you've rightly pointed out, comes from the parser (and possibly other) state. I'll get to the parser state below. Modulo state, I've been in favour of treating all HOL theories as if they are collected together in one giant import. I think this is conceptually clean if you view HOL as providing non-CakeML logical support. However, I can also see the appeal of the view that every theory is an independent module no matter which repository it lives in. I'd say a disadvantage of that view is giant, unwieldy open
lines, but policy design for highly dependent theories is an under-explored area and I wouldn't be surprised if we could come up with a nice clean modular approach (perhaps in a separate discussion).
I think the parser state issue is nicely solved by set_grammar_ancestry
. As you say, this doesn't work if you include "misc"
in the list and miscTheory
defines constants. Hence this issue. Preventing miscTheory
from defining constants is a reasonable solution. Another solution that comes to mind is to temp_overload_on
the relevant constants where needed, in addition to a set_grammar_ancestry
without "misc"
.
So why did I initially express disagreement? I will try to explain more below. Note that I'm using this discussion to better understand the issue and come to an informed decision: I appreciate your input on this :)
I view miscTheory
as something transient and ultimately unnecessary. Therefore, I am averse to the idea of policy mentioning it, such as "miscTheory shouldn't define constants". Of course, while miscTheory
is very real and here right now, perhaps we do need some policies; but I'd rather find a way to speed up its disappearance. In the interim, I would suggest prioritising moving out constant definitions (in line with this issue's suggested policy), and using the overloading solution I mentioned above otherwise.
In response to your point (1), those theorems should be removed somehow. If they can't be moved to HOL, I presume it's because they're too ugly or specific. Are there other reasons? If too ugly, can a cleaner/nicer version be proved and used instead? If too specific, can they be moved back as Q.prove
lemmas near their site of use (and if there's more than one site of use, are they really too specific)? (I have previously been against all uses of Q.prove
; if it turns out there is a strong need for highly specific non-reusable lemmas, I could be convinced this is a job for Q.prove
.)
(In response to your point (2), I hope it's clear from the above that I do not advocate set_grammar_ancestry ["misc", ...]
.)
I think it's important to distinguish between the ideal state and the current state. When I say I think miscTheory
should be allowed to define constants, I mean that for the current state where miscTheory
is still around and being used as a clearinghouse. In the ideal state (or my version of it), miscTheory
should not define constants, as per this issue, because it should not exist at all.
I see two uses for miscTheory
. First and less importantly, it serves a role similar to bossLib
for those who can't be bothered to start every theory with an opening salvo like
open HolKernel bossLib boolLib boolSimps lcsymtacs Parse libTheory
open bagTheory optionTheory combinTheory dep_rewrite listTheory pred_setTheory finite_mapTheory alistTheory rich_listTheory llistTheory arithmeticTheory pairTheory sortingTheory relationTheory totoTheory comparisonTheory bitTheory sptreeTheory wordsTheory wordsLib set_sepTheory indexedListsTheory stringTheory ASCIInumbersLib machine_ieeeTheory
Second, as a clearing house for things that don't fit anywhere else and that should probably be upstreamed or extirpated eventually. But I disagree that having such clearing house is something transient or unnecessary; the contents of the clearing house are of course transient, but I don't imagine a future when the need for the clearing house itself disappears. It will probably always be the case that there are things we want to upstream but can't be bothered with right now. Having an intermediate point such as miscTheory
, to which they can be halfway-upstreamed, strikes me as a way nicer workflow than always going directly to HOL: that might lead to the same kind of chaos we've had last week all the time instead of once in a blue moon.
Indeed, it is only the parsing issue that bothers me. I am mostly happy with the use of misc
to act as a simple open
(though, strictly, it is preamble
that you open) for the purpose of giving us a logical baseline that removes the need for many open
declarations.
Note how misc
has already been hacked with in an attempt to try to let people use its parsing context: it starts with a whole bunch of calls to explicitly remove names that are coming in from a bunch of real theories (i.e., theories about the real numbers). When I see calls like that, or conversely, uses of misc$foo
, it seems to me that these are a symptom of a problem. The cure for this problem is to never use the misc
parsing context, and to not let misc
define constants.
misc
and its future:If you have misc
, then you will continue to have misc
until you make a concerted effort to get rid of it. Having a transient policy that encourages it to grow is only going to ensure that we never get rid of it.
(Johannes suggests that we have it as a clearing-house that acts as a moving target, but I personally think that's pretty ugly. In particular, I can do a recursive grep for TODO: HOL-move
across the CakeML directories as easily as page through a miscScript.sml
looking for much the same. Doing that at set intervals seems like it shouldn't be any more disruptive than pulling things out of misc
. Indeed, it would be less disruptive because only affected theories and their descendants would need recompiling, instead of everything.)
As for things that aren't going to get adopted by HOL:
read_bytearray. This constant is just not going to cut it.
GENLIST
and OPT_MMAP
)..common..
background theory about abstract treatments of memory. That's where you might also put all_words
and asm_write_bytearray
. If you combine all those into such a theory, and convince yourself that you have a nice, general-purpose abstraction that isn't irrevocably tied to CakeML-specific choices, that whole theory might be usefully transferred to HOL.IMAGE SUC x SUBSET IMAGE SUC y UNION IMAGE SUC z <=> x SUBSET y UNION z
. Never, never, never. Use of SUC
makes it hopelessly specific, and it should clearly be a combination of two theorems: one about distribution of IMAGE
over UNION
and one about IMAGE
over SUBSET
. Both should be about arbitrary injective functions rather than just SUC
.
MAP3
. Why should HOL be a home to something that is never used?
fromList2 l = SND (FOLDL (\(i,t) a. (i + 2,insert i a t)) (0,LN) l)
Clearly this really is irrevocably tied to weird CakeML choices about maps that only want to define entries for even-numbered keys. Clearly this should be defined with a GENLIST call to number the elements of the list, followed by a call to fromAList
. But then, the one lemma about this in miscTheory
becomes quite trivial. Clearly, no-one would find this an interesting addition to HOL.
EVERY_o
- as per my comment (recently added on the cleanup
branch), this is the GSYM
of an existing theorem. Unless you can make a strong argument that this way round should be an automatic rewrite, there's no need for this in HOL.
anub
is somewhat interesting. It doesn't need to be defined with an accumulator, though the version without has a more interesting termination proof and needs a removekey
function or similar to be used as well. The version that's there is probably pragmatically better (though both are quadratic in worst case). I'm not keen on accumulator-versions for HOL, so when I do define the other, there will presumably be mass breakage. (Or I define a version in HOL called alist_norm
and leave anub
completely untouched.)
So, to answer one of @xrchz's questions: yes, I do believe some of the stuff in misc
should be pushed back to the theories that use it. If there are multiple theories using things, then those things could be put into a ancestor theory that just those theories inherit from. (We have backend_common
as an example of such a theory already.)
@mn200 are you suggesting by "if it's to be defined at all" that read_bytearray
should be an overloading as follows?
read_bytearray a c gb = OPT_MMAP gb (GENLIST (λi. a + n2w i) c)
I don't think read_bytearray
should be an overloading.
But I do agree that it seems like a strangely specific definition to move to HOL. I think it fits best in compiler/backend/semantics/backendPropsScript.sml in CakeML, which is a file that has the following fitting readme comment:
General definitions and theorems that are useful within the proofs
about the compiler backend.
Similarly, in my mind fromList2
also belongs there.
miscTheory
in the ideal stateShould we be aiming to remove miscTheory
altogether? I think this is a question to answer first, because most other decisions depend on it.
miscTheory
to disappear and view it only as a temporary measure.miscTheory
will only disappear if we try to get rid of it. I think #549 is an example of such trying, but I presume he means more.open
ing "the HOL libraries". However, as @mn200 noted, preamble
serves this purpose and does not require miscTheory
.Aggregating these points, I believe miscTheory
ought to go. I would be happy to hear further contrary views on this matter. (I have one minor such point at the end of this message.) If there aren't any, I suppose we all agree that we are trying to get rid of miscTheory
and should double down on #549 and related efforts.
miscTheory
Question for @mn200: is the list of exceptions above exhaustive? (Relative to the contents of miscTheory
on the cleanup
branch; there will be more decisions for things still in the rest of the repo.) For each, I have a straightforward opinion (always about the ideal/target state; there may be intermediate waypoints to getting there):
read_bytearray
: go to backendProps
as @myreen suggestsSUC
-based lemmas: delete them and do the proofs with the more general theorems in HOL instead. If the proofs are made much worse thereby, and the SUC
-based lemmas are not used more than once, move them to their use site. Otherwise we return to discussion.MAP3
- MAP3
is used in compilationLib.sml
. I think it should go to listTheory
.fromList2
- switch to the better definition suggested by @mn200, and move to backend_common
or similar (depending on its exact use sites).EVERY_o
- delete it and fix proofs that use it to use the version in HOL.anub
- move to HOL in the same theory as a non-accumulator version (cf. REVERSE
and REV
).Since it's supposed to just be for a short time, I'd rather leave it unregulated. However, if people still want rules about what goes into miscTheory
, I have remaining disagreements about what those rules should be. I will voice them if they become relevant.
miscTheory
Currently, a lot of work goes into keeping the CakeML code base somewhat reasonable (and it's still a sprawling mess, as code bases are wont to be). The rough workflow for this seems to be:
TODO: move
comment.TODO: move
comment to miscTheory
miscTheory
At every step, there are opportunities for tidying / improving lemmas, reconciling duplicates, removing unused experiments, choosing a better destination than miscTheory
, etc. Without miscTheory
, steps 2 and 3 need to be combined into one. I worry that this makes the already daunting job even more daunting. However, I'm happy to give it a go and would love to learn that this worry is unfounded.
This conversation reminds me of byteTheory
as proposed in #521.
My list wasn't exhaustive; it was meant to demonstrate the variety of ways in which things are not right for moves to HOL. Another example would be the theorem that turns the iff that defines SUBSET
into one direction of the implication. I'm mostly in agreement with how you'd handle the other cases (still don't like MAP3
; what's next MAP
s 4 through 10?)
I wouldn't seriously suggest using an overload for read_bytearray
, but it strikes me as akin to putting a constant into HOL to represent adding one to the length of a list.
The problem with having misc
at all is that it becomes a dumping ground for random crap with the implicit suggestion that I have to sort through it looking for the good stuff. Now, I admit I'm being hyperbolic: the ratio of good stuff to crap is actually very high. I think having TODOs in the original files is better practice though: energy can be focused on problems in the context of their use.
If
misc
defines constants, you have to have its grammar in scope to see them cleanly (the alternative is having to writemisc$constname
). But if you do that, you also bring in everything else thatmisc
imports, which is pretty well all of HOL (theories of the reals, integers, floating-point numbers, paths, bags, lazy lists, etc).This is disastrous for isolation of concerns.