CakeML / cakeml

CakeML: A Verified Implementation of ML
https://cakeml.org
Other
954 stars 84 forks source link

misc theory shouldn't define constants #573

Open mn200 opened 5 years ago

mn200 commented 5 years ago

If misc defines constants, you have to have its grammar in scope to see them cleanly (the alternative is having to write misc$constname). But if you do that, you also bring in everything else that misc imports, which is pretty well all of HOL (theories of the reals, integers, floating-point numbers, paths, bags, lazy lists, etc).

This is disastrous for isolation of concerns.

xrchz commented 5 years ago

To make progress on this issue, I think we will need to have a discussion on what miscTheory is for and related design decisions.

I do not agree with the issues as it stands, i.e., I think miscTheory should be allowed to define constants, and furthermore, I think all of HOL should be assumed to be in scope throughout CakeML.

But I also think miscTheory should be upstreamed out of existence, so any rules about it should make sense given that transience.

mn200 commented 5 years ago
  1. I don't know where you think all of misc is going to be upstreamed to, because there are theorems in it that are never going to make it to HOL

  2. Developing with all of HOL in scope is a major pain and a maintainability disaster. If I understand you right, you'd advocate for set_grammar_ancestry ["misc", ...] (bringing misc into scope grammatically). If you do this, you can't use the variable name last (for example) because that's defined in path, even if, conceptually, your script has nothing to do with paths. A change to HOL's theory of bags can break a script file that never mentions or uses bags.

    Are you really in favour of making every developer of every script file aware of every change to every theory in HOL, and to have them change their variable names when new constants are added to theories about which they have never needed to be concerned?

xrchz commented 5 years ago

So why did I initially express disagreement? I will try to explain more below. Note that I'm using this discussion to better understand the issue and come to an informed decision: I appreciate your input on this :)

IlmariReissumies commented 5 years ago

I see two uses for miscTheory. First and less importantly, it serves a role similar to bossLib for those who can't be bothered to start every theory with an opening salvo like

open HolKernel bossLib boolLib boolSimps lcsymtacs Parse libTheory
open bagTheory optionTheory combinTheory dep_rewrite listTheory pred_setTheory finite_mapTheory alistTheory rich_listTheory llistTheory arithmeticTheory pairTheory sortingTheory relationTheory totoTheory comparisonTheory bitTheory sptreeTheory wordsTheory wordsLib set_sepTheory indexedListsTheory stringTheory ASCIInumbersLib machine_ieeeTheory

Second, as a clearing house for things that don't fit anywhere else and that should probably be upstreamed or extirpated eventually. But I disagree that having such clearing house is something transient or unnecessary; the contents of the clearing house are of course transient, but I don't imagine a future when the need for the clearing house itself disappears. It will probably always be the case that there are things we want to upstream but can't be bothered with right now. Having an intermediate point such as miscTheory, to which they can be halfway-upstreamed, strikes me as a way nicer workflow than always going directly to HOL: that might lead to the same kind of chaos we've had last week all the time instead of once in a blue moon.

mn200 commented 5 years ago

Indeed, it is only the parsing issue that bothers me. I am mostly happy with the use of misc to act as a simple open (though, strictly, it is preamble that you open) for the purpose of giving us a logical baseline that removes the need for many open declarations.

Note how misc has already been hacked with in an attempt to try to let people use its parsing context: it starts with a whole bunch of calls to explicitly remove names that are coming in from a bunch of real theories (i.e., theories about the real numbers). When I see calls like that, or conversely, uses of misc$foo, it seems to me that these are a symptom of a problem. The cure for this problem is to never use the misc parsing context, and to not let misc define constants.


The wider question of misc and its future:

If you have misc, then you will continue to have misc until you make a concerted effort to get rid of it. Having a transient policy that encourages it to grow is only going to ensure that we never get rid of it.

(Johannes suggests that we have it as a clearing-house that acts as a moving target, but I personally think that's pretty ugly. In particular, I can do a recursive grep for TODO: HOL-move across the CakeML directories as easily as page through a miscScript.sml looking for much the same. Doing that at set intervals seems like it shouldn't be any more disruptive than pulling things out of misc. Indeed, it would be less disruptive because only affected theories and their descendants would need recompiling, instead of everything.)

As for things that aren't going to get adopted by HOL:

mn200 commented 5 years ago

So, to answer one of @xrchz's questions: yes, I do believe some of the stuff in misc should be pushed back to the theories that use it. If there are multiple theories using things, then those things could be put into a ancestor theory that just those theories inherit from. (We have backend_common as an example of such a theory already.)

myreen commented 5 years ago

@mn200 are you suggesting by "if it's to be defined at all" that read_bytearray should be an overloading as follows?

read_bytearray a c gb = OPT_MMAP gb (GENLIST (λi. a + n2w i) c)

I don't think read_bytearray should be an overloading.

But I do agree that it seems like a strangely specific definition to move to HOL. I think it fits best in compiler/backend/semantics/backendPropsScript.sml in CakeML, which is a file that has the following fitting readme comment:

General definitions and theorems that are useful within the proofs
about the compiler backend.

Similarly, in my mind fromList2 also belongs there.

xrchz commented 5 years ago

miscTheory in the ideal state

Should we be aiming to remove miscTheory altogether? I think this is a question to answer first, because most other decisions depend on it.

Aggregating these points, I believe miscTheory ought to go. I would be happy to hear further contrary views on this matter. (I have one minor such point at the end of this message.) If there aren't any, I suppose we all agree that we are trying to get rid of miscTheory and should double down on #549 and related efforts.

Where to put the contents of miscTheory

Question for @mn200: is the list of exceptions above exhaustive? (Relative to the contents of miscTheory on the cleanup branch; there will be more decisions for things still in the rest of the repo.) For each, I have a straightforward opinion (always about the ideal/target state; there may be intermediate waypoints to getting there):

Policy in the meantime

Since it's supposed to just be for a short time, I'd rather leave it unregulated. However, if people still want rules about what goes into miscTheory, I have remaining disagreements about what those rules should be. I will voice them if they become relevant.

CakeML cleanup in a world without miscTheory

Currently, a lot of work goes into keeping the CakeML code base somewhat reasonable (and it's still a sprawling mess, as code bases are wont to be). The rough workflow for this seems to be:

  1. Move generic things to the top of the script file, with a TODO: move comment.
  2. Move things with a TODO: move comment to miscTheory
  3. Upstream things in miscTheory

At every step, there are opportunities for tidying / improving lemmas, reconciling duplicates, removing unused experiments, choosing a better destination than miscTheory, etc. Without miscTheory, steps 2 and 3 need to be combined into one. I worry that this makes the already daunting job even more daunting. However, I'm happy to give it a go and would love to learn that this worry is unfounded.

myreen commented 5 years ago

This conversation reminds me of byteTheory as proposed in #521.

mn200 commented 5 years ago

My list wasn't exhaustive; it was meant to demonstrate the variety of ways in which things are not right for moves to HOL. Another example would be the theorem that turns the iff that defines SUBSET into one direction of the implication. I'm mostly in agreement with how you'd handle the other cases (still don't like MAP3; what's next MAPs 4 through 10?)

I wouldn't seriously suggest using an overload for read_bytearray, but it strikes me as akin to putting a constant into HOL to represent adding one to the length of a list.

mn200 commented 5 years ago

The problem with having misc at all is that it becomes a dumping ground for random crap with the implicit suggestion that I have to sort through it looking for the good stuff. Now, I admit I'm being hyperbolic: the ratio of good stuff to crap is actually very high. I think having TODOs in the original files is better practice though: energy can be focused on problems in the context of their use.