GrammaticalFramework / gf-core

Grammatical Framework core: compiler, shell & runtimes
https://www.grammaticalframework.org
Other
129 stars 35 forks source link

LPGF: Linearisation-only PGF format #103

Open johnjcamilleri opened 3 years ago

johnjcamilleri commented 3 years ago

Introduction

Recently I've been working on resurrecting on old idea, which is adding support for a PGF file format which only supports linearisation, since this is actually quite a common use for GF. The motivations are:

  1. Faster & less memory-intensive compilation
  2. Smaller binary files
  3. Faster linearisation at runtime
  4. New features impossible with parsing, e.g. dynamic lexicon.

The format itself is described in section 2 of the paper:

"PGF: A Portable Run-Time Format for Type-Theoretical Grammars" Angelov, Bringert, Ranta (2009). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.640.6330&rep=rep1&type=pdf

(where it is confusingly called "PGF"; what we call "PGF" today is really "PMCFG", section 3 of the same paper).

Progress so far

This draft pull request contains the following:

  1. An implementation of the LPGF format and runtime (src/runtime/haskell/LPGF.hs) which is correct w.r.t. the PGF and PGF2 implementations, with the exception of:
    1. Linearisation of missing functions (low priority)
    2. Variants, which are intentionally not supported
  2. Compilation from GF (canonical) to LPGF (src/compiler/GF/Compiler/GrammarToLPGF.hs), which can be used in the expected way: gf --make --output-format=lpgf ...
  3. Test suite with unit-test, Foods, and Phrasebook grammars for testing correctness.
  4. Benchmark for comparing performance between PGF, PGF2 and LPGF.

Notable ommisions

Performance

Unfortunately, so far I haven't been able to live up to all the performance goals:

So my current focus is on trying to improve the performance of the LPGF compiler, with which I am struggling. I have done what I can with improving the data structures and algorithms used, but I am rather inexperienced with tinkering with strictness and other Haskell performance tuning. If anyone has more expertise in this area then please let me know and I can get more specific about where the bottlenecks are and what I've tried already. Until then, this pull request can remain open and be where any major updates to this project are made.