gfngfn / SATySFi

A statically-typed, functional typesetting system

GNU Lesser General Public License v3.0

1.18k stars 84 forks source link

New API to fix footnote duplication problem #266

Open yasuo-ozu opened 3 years ago

yasuo-ozu commented 3 years ago

We sometimes encounter the needs to evaluate inline-text or block-text more than one time. (For example, +xgenlisting command in satysfi-enumitem )

Evaluating inline-text twice occurs many problems, for example, counter of \footnote incremented twice: http://satysfi-playground.tech/permalink/71a660eee721e76a94ba063272874e37eafdff3970fa8ba95d6d923bc4efef32

To prevent this, I suggest new API to prevent such problems for commands using let-mutable.

Design

get-command-identity ctx: context -> string

it returns some hash value to identify the command which the ctx given for.

Using this API, the \footnote command (or FootnoteScheme) will be defined like:

let-mutable footnote-ref <- 0 in
let mutable footnote-dict <- [] in
let-inline ctx \footnote it = 
    %...
    let hash = get-command-identity ctx in
    if hash is in footnote-dict then
        footnote-ref <- !footnote-ref + 1
    else
        footnote-dict <- hash :: !footnote-dict
    in
    % ...

Example

For simple, the hash is generated from the number of byte location where the command is used in inline-text (determined when generating AST).

But in complicated cases, it will not work:

let-inline ctx \footnote-wrap it =
    read-inline ctx { 
        \footnote(#it;);  % (a) bytes from file head
    }
in
document '<
    +p {
        \footnote-wrap { hello } % (b) bytes from file head
        \footnote-wrap { world } % (c) bytes from file head
   }
>

Correct behavior of the code is displaying hello and world in footnote. So we want to make the hashes different. To solve this, make ctx contain the history of byte locations and calculete the hash as

in \footnote call of \footnote-wrap { hello } -- get-command-identity ctx returns hash([(a); (b)])
in \footnote call of \footnote-wrap { world } -- get-command-identity ctx returns hash([(a); (c)])

yasuo-ozu commented 3 years ago

To be short: 現在のSATySFiでは、inline-textに\footer等が含まれている場合、そのinline-textを2回評価すると、footerのカウントが2回インクリメントされてしまいます。

これを回避するために

get-command-identity ctx: context -> string

というAPIを提案します。これは、

let-inline ctx \footer-wrap it = %...
in
% ...
{
    \footer-wrap{ hello }
}

のように、inline-text中でコマンド(ここでは\footer-wrap)が使われるたびに、ファイル先頭からのバイト位置をctx中のリストに追加していきます。get-command-identity ctxが呼び出された時、このリストの内容を元にハッシュを生成します。

\footnote等、let-mutableな変数を更新するコマンドでは、get-command-identityが提供するハッシュ情報を登録する辞書を作成し、その辞書にハッシュが登録されていない場合のみmutable変数を更新するようにします。

elpinal commented 3 years ago

Is that a problem? Does it mean that all state-mutating commands that may be used inside +xgenlisting or something alike are forced to use get-command-identity if one wants to avoid unintended behavior? I consider using read-inline to the same inline-text twice itself as a problem. As for +xgenlisting, another extension to SATySFi might be needed to deal with state-mutating commands, but I argue that it is not a good solution to obligate providers of commands like \footnote to manage effects in such a sophisticated way.

gfngfn commented 3 years ago

Thank you for having a discussion (& sorry for the late response).

As to the language design for inline texts and inline box rows, I have a thought close to @elpinal -san's one. That is, I suppose that applying read-inline twice to the same inline texts itself is a somewhat problematic usage. Inline texts in general have effects of mutating states, and thus basically they can be regarded as “affine” resources (though this is not reflected in type-level restriction).

Certainly, I also feel a slight need to consider that there would be some case where inline texts are essentially required to be used more than once. For instance, consider the case where there’s more than one choice of how to render it : inline-text depending on the total size of the inline box rows resulting from it:

let-inline ctx \decorate it =
  let ib1 = read-inline (some-settings-1 ctx) it in
  let ib2 = read-inline (some-settings-2 ctx) it in
  if first-one-is-better (get-natural-metrics ib1) (get-natural-metrics ib2) then
    ib1
  else
    ib2

IMHO, however, adding primitives like get-command-identity seems to introduce too much complication to the semantics of the language. I feel that how to solve such a problem is rather in the scope of the language design than that of just adding primitives. For example, if SATySFi has a kind of state-passing semantics (like that of Elm or React) and is free from mutable references, one can safely implement the command above by:

let-inline state ctx \decorate it =
  let (state1, ib1) = read-inline state (some-settings-1 ctx) it in
  let (state2, ib2) = read-inline state (some-settings-2 ctx) it in
  if first-one-is-better (get-natural-metrics ib1) (get-natural-metrics ib2) then
    (state1, ib1)
  else
    (state2, ib2)

(though this tends to make code somewhat redundant.)

yasuo-ozu commented 3 years ago

Thanks for the discussion. I also agree with state-passing syntax, but it is very breakable change to the current syntax. For the first step to make mutable variables obsolete, I suggest the following syntax:

(Type.t is inspired by SATySFiでad hoc多相)

set-context-variable : string -> Type.t -> 'a -> context -> context
get-context-variable : string -> Type.t -> context -> 'a option
duplicate-context : context -> context
apply-context : context -> context -> ()

The goal of this syntax is to put all mutable variables inside of context.

Compared to current syntax

The update timing of mutable variables are more clear.
Solve problems with let-mutable references.

Compared to state-passing syntax

Does not break backward compatibility.
Less redundant.
Not cool design.

yasuo-ozu commented 3 years ago

If compositing state into context is unsound, How about replacing current SATySFi's context to ('a, context)? I think it is more compatible way to use state and context separately, and we do not have to add primitive like *-context-variable, duplicate-context and apply-context. However, this way cannot diminish let-mutable, because command provider like \footnote should manage its state corresponded to 'a

yasuo-ozu commented 3 years ago

For example, regarding SATySFi's context as (int list, old-context), we can implement mutable behavior using let-mutable like:

let-mutable identical-number <- 0 in
% duplicate-context : context -> context
let duplicate-context ctx =
    let (l, etc) = ctx in 
    let l = !identical-number :: l in
    let () = identical-number <- !identical-number + 1 in
    (l, etc)
in
let-mutable mutable-state <- Dict.make in
% get-context-variable : string -> context -> int option
let get-context-variable str ctx =
    let-rec inner l =
        match Dict.get(l, str) !mutable-state with
            | Some(r) -> Some(r)
            | None -> match l with
                | _ :: l -> inner l
                | _ -> None
    in
    let (l, _) = ctx in
    inner l
in
% set-context-variable : string -> int -> context -> ()
let set-context-variable str num ctx =
    let (l, _) = ctx in
    let () = mutable-state <- Dict.set (l, str) num !mutable-state in
    ()
in
let-inline ctx \footnote it =
    % ...
    let n = get-context-variable `footnote-number` ctx in
    let () = set-context-variable `footnote-number` ctx (n + 1) in
    % ...
let-inline ctx \eval-twice it =
  let tmp-ctx = duplicate-context ctx in
  let tmp-ib = read-inline tmp-ctx it in
  let measuring = get-natural-metrics tmp-ib in
  read-inline (some-settings measuring ctx) it

gfngfn commented 3 years ago

Thanks for additional suggestions. I have a few remarks, however:

The first suggestion looks unrealizable, since contexts are passed “from the outside to the inside” but the opposite never happens (i.e., every command does not return an updated context).
- This is why I consider introducing immutable states, which can be returned by commands “from the inside to the outside”.
I don’t really understand what duplicate-context : context -> context and apply-context : context -> context -> () in the first suggestion are intended to be, but as long as these primitives are meaningful, their existence at least indicates that contexts have mutable state. Such a semantics does not seem more elegant than ones having mutable references.
The second suggestion is not backward-compatible, since it will at least break the completeness of the type inference and will require every command (and thereby inline texts) to have a type parametrized by 'a of 'a * context.
Now that a few years have passed since SATySFi was released for the first time and many imperfections of the language design have come out at that time, I’m not so reluctant to break the backward compatibility, as long as the versioning is carefully handled.
- For instance, I’m currently replacing the module system with that based on F-ing modules.

gfngfn commented 3 years ago

(The following is a rough translation of the response above.)

さらに提案頂いてありがたいです．ただ，いくつか指摘したいことがありました：

1案目による実現は困難そうです．というのも，現状のテキスト処理文脈は “外側から内側へ渡される” ことはあっても逆はできないようになっているためです．
- まさにこれがイミュータブルな状態を持ち回る意味論の案を紹介した理由です（この方式なら “内側から外側へ状態を伝播する” ことが可能です）．
1案目にある duplicate-context : context -> context と apply-context : context -> context -> () がどのような操作を指しているのかあまりわかっていませんが，少なくともこのような操作が意味をもつならばテキスト処理文脈は内部に書き換え可能な状態を持っていることになり，それはミュータブルな参照がある意味論と比べて簡潔になっていないように思います．
2案目は残念ながら後方互換ではないと思います．というのも，第0引数の形式が複数ありうることにすると少なくとも型推論の完全性を失ってしまうほか，あらゆるコマンド定義（ひいてはインラインテキスト）はテキスト処理文脈のもつ多相性を型パラメータとしてもつ必要が出てくるためです．
今やSATySFiの最初のリリースから数年が経ち，その間に現状の言語設計にもいろいろ弱点があることがわかってきたので，ヴァージョニングに関して適切に配慮できる限り非互換な変更を施すのも吝かではないかなと思っています．
- 例えば，モジュールシステムは現在のものからF-ing moduleに基づくものに置き換えようと実装を進めています．

yasuo-ozu commented 3 years ago

Thanks for reply.

The second suggestion is not backward-compatible

It can be compatible if the language restrict 'a to int list. In fact, in above example, 'a' is bound to int list.

Thanks for explanation of philosophy and I understood that context should immutable in language design. However, the state-passing example is redundant. Is there any other solution for this so far?