coq / vscoq

Visual Studio Code extension for Coq
MIT License
335 stars 68 forks source link

buggy ppstring with notations #924

Open tomtomjhj opened 6 days ago

tomtomjhj commented 6 days ago

While implementing pretty printer in vscoq.nvim (https://github.com/tomtomjhj/vscoq.nvim/blob/main/lua/vscoq/pp.lua), I noticed that ppstrings for terms that contain notations have some issues with breaks.

The following is tested with coq vesion 8.19.2, vscoq-language-server 2.2.1, latest version of vscoq.nvim with window width fixed to 80 character cells.

Let's first define some functions for testing.

Definition F (q w e r t y u i o p : nat) : nat := q + w + e + r + t + y + u + i + o + p.

Definition equiv (x y : nat) := True.

1. Inconsistency in the locations of breaks

First, let's take a look at the ppstring for the = (the default notation for eq) and a custom notation for equiv defined similarly to =.

(* defined in Init/Logic.v *)
(* Notation "x = y" := (eq x y) : type_scope. *)

(* custom notation; requires level because "A left-recursive notation must have an explicit level." *)
Notation "x ≡ y" := (equiv x y) (at level 70).

For the term containing =:

Goal F 111 111 111 111 111 111 111 111 111 111 = F 111 111 111 111 111 111 111 111 111 111.

I get the following ppstring from vscoqtop,

{ "Ppcmd_box", { "Pp_hovbox", 0 }, { "Ppcmd_glue", {
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", " =" } },
  { "Ppcmd_print_break", 1, 0 },
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... }
} } }

which is supposed to be printed like this.

F 111 111 111 111 111 111 111 111 111 111 =
F 111 111 111 111 111 111 111 111 111 111

But for the term containing :

Goal F 111 111 111 111 111 111 111 111 111 111 ≡ F 111 111 111 111 111 111 111 111 111 111.

I get this ppstring,

{ "Ppcmd_box", { "Pp_hovbox", 0 }, { "Ppcmd_glue", {
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... },
  { "Ppcmd_print_break", 1, 0 },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", "≡ " } },
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... }
} } }

which is printed like this

F 111 111 111 111 111 111 111 111 111 111
≡ F 111 111 111 111 111 111 111 111 111 111

Note that the location of break is different, despite that their notation definition is more or less the same.

What's more confusing is what the documentation says (https://coq.inria.fr/doc/v8.19/refman/user-extensions/syntax-extensions.html#notations) (emphasis mine)

The default printing of notations is rudimentary. For printing a notation, a formatting box is opened in such a way that if the notation and its arguments cannot fit on a single line, a line break is inserted before the symbols of the notation and the arguments on the next lines are aligned with the argument on the first line.

According to the first emphasized point, the break location of is correct (before the symbols of the notation), but the second point says that the alignment of its rhs is wrong. If I understood the documentation correctly, it says the result should be something like this:

  F 111 111 111 111 111 111 111 111 111 111
≡ F 111 111 111 111 111 111 111 111 111 111

If I add extra space for notation (as per "This is performed by adding extra spaces between the symbols and parameters"), I get yet another different result:

Notation "x  ≡  y" := (equiv x y) (at level 70).
{ "Ppcmd_box", { "Pp_hovbox", 0 }, { "Ppcmd_glue", {
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... },
  { "Ppcmd_print_break", 1, 0 },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", "≡" } },
  { "Ppcmd_print_break", 1, 0 },
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... }
} } }
F 111 111 111 111 111 111 111 111 111 111 ≡
F 111 111 111 111 111 111 111 111 111 111

2. Lack of breaks

What's more critical is that when you specify format, it doesn't get any breaks at all.

Notation "x ≡ y" := (equiv x y) (at level 70, format "x  ≡  y").

Goal F 111 111 111 111 111 111 111 111 111 111 ≡ F 111 111 111 111 111 111 111 111 111 111.
{ "Ppcmd_glue", {
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", " " } },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", "≡" } },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", " " } },
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... }
} }
F 111 111 111 111 111 111 111 111 111 111 ≡ F 111 111 111 111 111 111 111 111 111
                                              111

Note that the ppstring doesn't have a hovbox around the whole term, and there is no break between lhs, , and rhs.

A similar issue happens when you print something with the builtin utf8 notations. For example, this is the rendered result of vscoq/about for a lemma in Iris:

wp_fupd :
∀ {hlc : has_lc} {Λ : language} {Σ : gFunctors} {irisGS_gen0 : irisGS_gen hlc Λ Σ} (s : stuckness) (E : coPset) (e : Λ
                                                                                                                     .(language.expr)) (Φ : Λ
                                                                                                                                            .(language.val)
                                                                                                                                            → iProp
                                                                                                                                                Σ),
  WP e @ s; E {{ v, |={E}=> Φ v }} ⊢ WP e @ s; E {{ v, Φ v }}

Note that there is no break between each binder for the notation, defined as follows in Unicode/Utf8_core.v.

Notation "∀ x .. y , P" := (forall x, .. (forall y, P) ..)
  (at level 10, x binder, y binder, P at level 200,
  format "'[  ' '[  ' ∀  x  ..  y ']' ,  '/' P ']'") : type_scope.

3. Lack of spaces with explicit breaks

Using '/' in the format ensures line breaks

Notation "x ≡ y" := (equiv x y) (at level 70, format "'[' x  ≡ '/' y ']'").

Goal F 111 111 111 111 111 111 111 111 111 111 ≡ F 111 111 111 111 111 111 111 111 111 111.
{ "Ppcmd_box", { "Pp_hovbox", 0 }, { "Ppcmd_glue", {
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ... },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", " " } },
  { "Ppcmd_tag", "constr.notation", { "Ppcmd_string", "≡" } },
  { "Ppcmd_print_break", 0, 0 },
  { "Ppcmd_box", { "Pp_hovbox", 2 }, ...  }
} } }
F 111 111 111 111 111 111 111 111 111 111 ≡
F 111 111 111 111 111 111 111 111 111 111

But note that the Ppcmd_print_break doesn't have a space associated with it. So if the term is smaller than the window width, there is no space between and the rhs.

Goal F 1 1 1 1 1 1 1 1 1 1 ≡ F 1 1 1 1 1 1 1 1 1 1.
F 1 1 1 1 1 1 1 1 1 1 ≡F 1 1 1 1 1 1 1 1 1 1
Durbatuluk1701 commented 5 days ago

For 1. and 2. I notice the same behavior in VsCoq, as well as coqtop (8.20): image

The appearance of this in coqtop makes me believe that the fundamental Pp issue here (which if I understand correctly, it is that you want the default Pp for a notation like x OP y to be formatted the same way as x = y is?) would need to be solved from above VsCoq.

However, for 3. I do not notice this issue on main for VsCoq (possibly related to me being on 8.20 though).

tomtomjhj commented 5 days ago

it is that you want the default Pp for a notation like x OP y to be formatted the same way as x = y is

I prefer x = y's formatting, but I think whatever that is consistent and predictable is fine.


The appearance of this in coqtop makes me believe that the fundamental Pp issue here [...] would need to be solved from above VsCoq.

I also believe there are some issues in coq itself, but the following example suggests that vscoqtop is doing some more wrong stuff.

Definition F (q w e r t y u i o p : nat) : nat := q + w + e + r + t + y + u + i + o + p.
Definition equiv (x y : nat) := True.
Notation "x ≡ y" := (equiv x y) (at level 70, format "x  ≡  y").

(* From Coq Require Import Utf8. *)

Lemma lemma {q} {a : bool} {w} {s : bool} {e} {d : bool} {r} (f : bool) {t y u i o p} :
  F q w e r t y u i o p ≡ F q w e r t y u i o p /\ a = s /\ s = d /\ d = f.
Admitted.
About lemma.

With From Coq Require Import Utf8. commented out, I get

lemma :
forall {q : nat} {a : bool} {w : nat} {s : bool} {e : nat} 
  {d : bool} {r : nat} (f : bool) {t y u i o p : nat},
F q w e r t y u i o p ≡ F q w e r t y u i o p /\ a = s /\ s = d /\ d = f

from both coqtop and vscoq2.

But when From Coq Require Import Utf8. is active, the result is different.


However, for 3. I do not notice this issue on main for VsCoq (possibly related to me being on 8.20 though).

I would argue that this is an additional bug in the vscode client. The first 0 in { "Ppcmd_print_break", 0, 0 } means that if the break does not introduce a newline, then it should be rendered as 0 spaces (see https://ocaml.org/manual/5.2/api/Format.html#VALprint_break). In fact, coqtop 8.20 prints F 1 1 1 1 1 1 1 1 1 1 ≡F 1 1 1 1 1 1 1 1 1 1 as expected.

ejgallego commented 5 days ago

Hi folks,

it seems there is a bit of confusion about printing in Coq, let me see if I can be of help.

Coq uses to print the OCaml Format library.

The Pp_* type that you see, is just a free reification of the Format language / constructors there.

Traditionally, Coq then called Format to turn a Pp_* to a string, and printed that. Format uses a particular formatting algorithm, which is dependent on the margins and width settings.

Some years ago, I changed the way CoqIDE prints as with Format, printing in CoqIDE was a bit of a PITA if for example the window changed appearance. Also I found it convenient to allow clients to implemented more fancy printing.

Thus, CoqIDE (which is an OCaml program) changed the wire representation and instead of sending a string started to send the Pp_* commands, but it called Format to render them, but with the correct width, etc... that saved a roundtrip and other complicated stateful protocol stuff about printing width etc...

I am not sure how current VsCoq algorithm is related to Format's algorithm, but my impression is that they are not the same? That is IMHO not a good idea as all the notations in Coq have been developed for the algorithm in OCaml's Format.

In jsCoq / coq-lsp we use Shachar's JavaScript reimplementation of Format's algorithm, so our results then to be quite (not always tho) consistent with what you can see in coqtop/CoqIDE. @tomtomjhj , the implementation is quite small, so maybe it could be useful to you (because indeed Format's algorithm is not so complex in the end)

Durbatuluk1701 commented 5 days ago

@ejgallego thank you for the information, I have a feeling that maybe VsCoq's algorithm does a little bit more than is possible format as it allows for ellipsis/elided boxes that can be expanded and closed (although maybe this is possible with format and I am just unaware). I do agree in general though that we need to be very careful to try to implement it as closely to Format as possible to avoid bugs like it appears there is here.

Additionally, do you have any idea why it seems like the equiv notation that @tomtomjhj defined prints differently than native = for coqtop? Is there something more happening in the background for =?

ejgallego commented 5 days ago

@Durbatuluk1701 Format can do that too, tho not interactively, see format's set_depth et al methods.

Tho I think what you see here is just differences in the basic layout algo? I don't know, I'll let vsCoq devs comment on that.

Additionally, do you have any idea why it seems like the equiv notation that @tomtomjhj defined prints differently than native = for coqtop? Is there something more happening in the background for =?

Maybe it also needs the no associativity flag?

eq is as this in 8.21:

Reserved Notation "x = y" (at level 70, no associativity).
tomtomjhj commented 5 days ago

In jsCoq / coq-lsp we use Shachar's JavaScript reimplementation of Format's algorithm, so our results then to be quite (not always tho) consistent with what you can see in coqtop/CoqIDE. @tomtomjhj , the implementation is quite small, so maybe it could be useful to you (because indeed Format's algorithm is not so complex in the end)

The pretty printers (ppstring -> rendered output) of VsCoq2's vscode client and vscoq.nvim are based on Oppen's algorithm (http://i.stanford.edu/pub/cstr/reports/cs/tr/79/770/CS-TR-79-770.pdf). I just took a quick look at OCaml's Format module (https://github.com/ocaml/ocaml/blob/trunk/stdlib/format.ml), and it seems it is also based on Oppen's algorithm, though it doesn't explicitly say so. Indeed, the rendered results of vscoq.nvim are quite similar to coqide's.

In the cases where vscoq2/vscoq.nvim's results are significantly worse than coqide/coqtop (https://github.com/coq-community/vscoq/issues/924#issuecomment-2395138098), I noticed that the ppstring sent from the vscoqtop server doesn't contain enough Ppcmd_print_break. This means that either (1) vscoqtop has a bug in generating ppstring; or (2) coqide's ppstring printer converts non-breaking spaces into breaks.

gares commented 5 days ago

Many thanks for the detailed report!

rtetley commented 4 hours ago

Thanks a lot for your report. In the above when you describe issues with the positioning of breaks in the Pp_cmd themselves I am pretty sure this is a Coq problem. However that last issue https://github.com/coq/vscoq/issues/924#issuecomment-2395138098 does sound like a problem on our side. I'll investigate.