[enhancement] `Require` normalization should be more fine-grained and robust to side-effects

JasonGross commented 3 years ago

Currently, require minimization is done all at once, and we basically give up if we cannot lift requires to the top of the file.

Proposal:

Split off the code that splits Import/Export from Require and absolutizes both of them. If this step changes the error message, then we should give up.
When inlining a file, first we try to inline it (in the 4 ways we currently do) with the Requires all lifted to the top level. If none of these work, we try again to inline doing nothing more than splitting the Requires from the Import/Exports and absolutizing them.
Add a pass to minimize_file which attempts to move Requires up as far as possible, as follows:
- First try moving the requires up all at once, as is currently done. If it succeeds, then skip the next steps, otherwise do them.
- Classify all statements as either (1) part of the admit axiom header, (2) a Require, or (3) neither of the above
- In reverse order, for each statement of type (2) (a Require), do binary search on finding the earliest statement of type (3) to which we can insert immediately before it the Require and all Requires transitively implied by this one which occur after the statement of type (3)

JasonGross commented 3 years ago

To do the final step, valid_nondefault_actions in binary search in the minimizer drivers should be generalized to be a function of the current state. The state should be two tuples of pairs of ([index of] require statement, index of statement to insert the require before). The first tuple should be the reversed list of finished requires; the second should be the reversed list of requires that might be moved earlier. At each state, the valid actions are the list of positions between 0 and the current position of the first require in the second list (exclusive). The default action in step state is to move a require from the second list to the end of the first, unchanged; the other actions are to do the move while changing the index (and simultaneously re-rooting all transitive dependencies of this require to be no higher than the indeed we're changing it to). To check/save a state, we insert all requires from both lists at the appropriate location, probably by walking the list of statements in order, yielding in reverse order all statements that should be inserted before the current one, and then yielding the current statement.

This way, we can perform binary search for each statement independently.

JasonGross commented 3 years ago

Hmm, perhaps we should iterate the binary search, where the outer list of actions is a reversed list of int up to # requires (exclusive), and we try to insert everything below the int (inclusive) at the 0 position. (Default action inserts nothing.)

We'd need to make sure to maintain the invariant that the list of requires is sorted by insertion index, and we want to use a stable sort so that we maintain dependency-ordering invariant (or maybe also sorting by index of the original require as well is fine).

JasonGross commented 3 years ago

Probably we want to stick this pass right before splitting the file into definitions (https://github.com/JasonGross/coq-tools/blob/9b3989871caa1315854a196f195ab2e4ce1a0c48/find-bug.py#L1082-L1088), and again at the end, before (or perhaps after?) stripping empty sections (https://github.com/JasonGross/coq-tools/blob/9b3989871caa1315854a196f195ab2e4ce1a0c48/find-bug.py#L1156) (maybe stripping empty sections should run based on statements rather than on the file contents with a regex?). The pass should run based on the file name, and should use get_coq_statement_byte_ranges and get_byte_references_for. We should generate a list of statements as follows:

interstatement text, which does not occur in any statement range, or else is the leading or trailing comment or whitespace of a statement
non-require statement, which must not start nor end with whitespace nor comment (these are split off)
any statement where the first non-comment non-whitespace text is Require\s should be split as follows:
- first we pull all the module references overlapping with this statement, and add them one at a time as Requires
- next we strip off Require from the start, split off comments and whitespace, and if the result starts with either Import or Export, then we add this statement as a non-require statement

This should be a fairly robust way to generate the statements that we need to run this pass. Perhaps someday it'll be replaced by proper parsing of the Coq AST

JasonGross / coq-tools

[enhancement] `Require` normalization should be more fine-grained and robust to side-effects #78