divvun / divvun-gramcheck-web

Grammar checker for web word processors, targeted at minority and indigenous languages, but open for everyone.
GNU General Public License v3.0
1 stars 0 forks source link

Putting the *-release.cg3 mechanism into use #73

Closed Trondtr closed 11 months ago

Trondtr commented 1 year ago

Today the grammarchecker file comes in two shapes, grammarchecker.cg3 and grammarchecker-released.cg3. The setup is not in use: For sme, we work on the latter file, for the other languages we work on the former. What is needed is the following:

  1. We all work on the file grammarchecker.cg3.
  2. We mark all ADD rules, e.g. with an initial x on the rules not fit for release
  3. In the make routine we add a procedure to comment out all ADD rules marked with x, thereby generating a grammarchecker-released.cg3 file strictly not for editing, containing only the rules marked for publication

The issue is getting actualised by the upcoming NoDaLiDa conference: In order to write sensible articles we need to focus on a subset of the rules.

For smn, fao, nob the rules are already marked wit x, testing the release procedure thus requires removing the x for one of the rules for testing.

Trondtr commented 1 year ago

Not being a programmer, I wrote a command line that did the trick: It removed all comments, put the content on one line per OPERATOR + rule content ; and commented out all x-marked rules. The fuss is due to the files writing the rules over several lines. What now is missing is (cleaning up this and) adding grammarchecker-release to the makefile setup.

 cat grammarchecker.cg3 |\
 tr '\t' ' '|\
 sed 's/\\;/semicolon/g;'|\
 sed 's/^#/∫/'|\
 sed 's/ #/ ∫/g;'|\
 sed 's/;/;∆∫/g;'|\
 cut -d"∫" -f1|\
 tr '\n' ' '|\
 sed 's/\(SECTION[^ ]*\) /\1∆/g;'|\
 tr '∆' '\n'|\
 sed 's/^ *//g'|\
 sed 's/ADD:x/#ADD:x/' |\
 uniq |\
 sed 's/semicolon/\\;/g;' > grammarchecker-release.cg3 
snomos commented 1 year ago

@flammie could you have a look at this? See also the following commit, especially the commit message:

https://github.com/giellalt/lang-smn/commit/06651d37eaae29b919afad8ea73b4cde0863377f

(or the following set of commits: https://github.com/giellalt/lang-smn/compare/4f1951879dcd...16d9d8e84451)

Not sure which of the two approaches are most user friendly when editing a CG file - whatever you choose to do, the overall goal is simplicity for the CG/grammar checker developer.

Programmatically, the goal is to automatically create a derived grammar checker file used for production, as a copy of the development version, but with unfinished rules either commented out or removed. The dev rules should be marked somehow to make the conversion automatic.

@lynnda-hill sending this to @flammie 😄

flammie commented 1 year ago

I wrote a gawk script that handles few different cases of ADD:x rules more. We planned slightly more elegant solution on IRC with potential future CG tooling or otherwise using CG's parser (e.g. vislcg3 --dump-ast),

snomos commented 1 year ago

Nice. Using the CG tooling somehow seems like a good idea - then the CG file parsing is already in place. Keep in mind that the derived grammar checker / CG file still needs to be debuggable and traceable, preferably with either the generated source file as a reference, or the original source file (whatever is easiest) - also the production version needs to be tested and debugged if needed 🙂

flammie commented 1 year ago

both ways should reserve line numbers and identifiers for tracing luckily.

snomos commented 1 year ago

I moved the script from SMN to Giella-core.

snomos commented 11 months ago

This is now implemented for all Sámi languages with a grammar checker. Closing.

Trondtr commented 11 months ago

Note that also nob and fao have a grammar checker.