cpitclaudel / alectryon

A collection of tools for writing technical documents that mix Coq code and prose.
MIT License
228 stars 36 forks source link

Why does code following a .. code-block end up nested into it? #20

Open erikmd opened 3 years ago

erikmd commented 3 years ago

Hi @cpitclaudel,

I'm not sure but I believe I found a small bug when processing alectryon.py --frontend coq+rst --backend webpage foo.v

on this foo.v file:

Check nat.
(* .unfold *)

(*| Test

.. code-block:: Coq

   Fixpoint entier_nat_ind (P : entier_nat -> Prop)
   (fZero : P Zero) (fSucc : forall n : entier_nat, P n -> P (Succ n))
   (n : entier_nat) : P n :=
     match n with
     | Zero => fZero
     | Succ n => fSucc n (entier_nat_ind P fZero fSucc n)
     end.
|*)

Check nat.
(* .unfold *)

I got the following page:

2021-03-11_00-06-38_Screenshot_foo_html

Would you have some idea of a fix or a workaround? (for the 2nd "unfold" directive to be processed, I mean)

cpitclaudel commented 3 years ago

Yep, what's happening here is that unless you add an explicit .. coq:: header yourself, Alectryon infers one at the same indentation level as the previous text.

You want this:


Check nat.
(* .unfold *)

(*| Test

.. code-block:: Coq

   Fixpoint entier_nat_ind (P : entier_nat -> Prop)
   (fZero : P Zero) (fSucc : forall n : entier_nat, P n -> P (Succ n))
   (n : entier_nat) : P n :=
     match n with
     | Zero => fZero
     | Succ n => fSucc n (entier_nat_ind P fZero fSucc n)
     end.

.. coq::
|*)

Check nat.
(* .unfold *)
erikmd commented 3 years ago

Hi @cpitclaudel, thanks for your reply, indeed that works!

But don't you think there is a bug when we don't put the .. coq:: line manually?

because in this case, Alectryon puts the .. coq:: explicitly in the HTML output (cf. the screenshot in my initial post), while I'd expect .. coq:: is a directive that shouldn't be part of the "raw" HTML text (?)

erikmd commented 3 years ago

Anyway, feel free to close the issue if you think omitting .. coq:: is not the "nominal syntax" and thereby Alectryon's behavior in this case should be ignored / kept as is.

cpitclaudel commented 3 years ago

But don't you think there is a bug when we don't put the .. coq:: line manually?

It's a gray area; I'd love your thoughts on how we could make it better.

because in this case, Alectryon puts the .. coq:: explicitly in the HTML output

That's not really what happens; rather, Alectryon first reconstructs a reST file from the input, and to do so it needs to infer which depth to place the code block at. The heuristic it uses is that it should be placed at the same level as the preceding text block, so it generates this:

.. coq::

   Check nat.
   (* .unfold *)

Test

.. code-block:: Coq

   Fixpoint entier_nat_ind (P : entier_nat -> Prop)
   (fZero : P Zero) (fSucc : forall n : entier_nat, P n -> P (Succ n))
   (n : entier_nat) : P n :=
     match n with
     | Zero => fZero
     | Succ n => fSucc n (entier_nat_ind P fZero fSucc n)
     end.

     .. coq::

        Check nat.
        (* .unfold *)

And then the ..coq part appears in the output because the thing above is a code block. This sounds nonsensical until you realize that this behavior is what we want for essentially any other directive, like .. note or .. exercise.

We could try allow/blocklisting certain environments, but I concluded it would be too complicated last time I looked at it. Happy to revisit that if you think there could be a better heuristic.

cpitclaudel commented 3 years ago

if you think omitting .. coq:: is not the "nominal syntax" a

Omitting the ..coq is very much the preferred syntax, and in fact the translator will remove all the redundant ones when it can, but this is a case where you need some sort of disambiguation.

(Consider if you had this instead:

(*|
Test

.. note::

   Some text

   .. code-block:: Coq

      Some code
|*)

Check nat.

Without some (predictable) heuristic, we don't know which block the final Check nat belongs to: to top level, the note, or (very unlikely) the code-block

erikmd commented 3 years ago

@cpitclaudel

That's not really what happens; rather, Alectryon first reconstructs a reST file from the input, and to do so it needs to infer which depth to place the code block at. The heuristic it uses is that it should be placed at the same level as the preceding text block, so it generates this:

[…]

And then the ..coq part appears in the output because the thing above is a code block. This sounds nonsensical until you realize that this behavior is what we want for essentially any other directive, like .. note or .. exercise.

We could try allow/blocklisting certain environments, but I concluded it would be too complicated last time I looked at it. Happy to revisit that if you think there could be a better heuristic.

OK, thanks for your explanation! I wasn't really aware of the fact one can have that many nested levels.

It's a gray area; I'd love your thoughts on how we could make it better.

On the other hand, it's true that in the case of a .. note immediately followed by some Coq code, attaching the Coq code that follows to the note looks sensible (more than coming back to the toplevel), but on the other hand, if we have a .. code-block or so (which precisely disables the evaluation of the Coq code), I believe it would be more natural to attach the following Coq code to the closest parent node that is not a .. code-block or so.

So ideally, some blacklist-based test could be implemented (with a list of node types, such as .. code-block, which share the semantics of describing "verbatim" code, as opposed to Coq code to be evaluated), and I guess this would contribute to make the migration of a coqdoc-based .v file to an alectryon-based .v file a bit more natural − fewer occurrences of .. coq blocks to add.

Maybe, the algorithm could be as follows:

Does this make sense to you?

cpitclaudel commented 3 years ago

It makes sense, with two difficulties: first, you have to track reST blocks now, not just indentation (the current code doesn't track blocks, it just looks at indentation levels); second, you have a more complex heuristics, so it fails less often but it's harder to explain when it does.

I think the best of both world could be to use this heuristic to produce warnings?

As a side note, if we go the heuristic route, then it would be interesting to think of ambiguous cases too, like this:

(*|
- Some list item
|*)

Some code

Should the code be inside the list item?

(*|
- Some list item
  continued on the next line 
|*)

Some code

What about here?

(The ground truth of what Alectryon does now is that in the first example the list ends and the code is at the top level after it, and in the second example the code is part of the list item)

But maybe a frame challenge is in order too: a lot of Alectryon is independent of the markup language (and in fact you can directly compile coqdoc documents), so we don't have to use an indentation-based language: if we were using asciidoc or an extended version of markdown, for example, then we wouldn't have that indentation problem. (reST is great for extensibility, but the syntax is frustrating in many ways, including the use of indentation instead of begin/end markers and the lack of nested inline markup)

erikmd commented 3 years ago

Hi @cpitclaudel

I think the best of both world could be to use this heuristic to produce warnings?

Probably yes: if a warning (saying sth like "ambiguous code: you may want to add a .. coq:: node manually") is simpler to implement rather than a full-blown, silent heuristic needing to keep track of all blocks… indeed, that would be nice :)

cpitclaudel commented 3 years ago

I don't think it would be easier to implement, unfortunately (it would have to do essentially the same work)