Closed Trondtr closed 11 months ago
Not being a programmer, I wrote a command line that did the trick: It removed all comments, put the content on one line per OPERATOR + rule content ; and commented out all x-marked rules. The fuss is due to the files writing the rules over several lines. What now is missing is (cleaning up this and) adding grammarchecker-release to the makefile setup.
cat grammarchecker.cg3 |\
tr '\t' ' '|\
sed 's/\\;/semicolon/g;'|\
sed 's/^#/∫/'|\
sed 's/ #/ ∫/g;'|\
sed 's/;/;∆∫/g;'|\
cut -d"∫" -f1|\
tr '\n' ' '|\
sed 's/\(SECTION[^ ]*\) /\1∆/g;'|\
tr '∆' '\n'|\
sed 's/^ *//g'|\
sed 's/ADD:x/#ADD:x/' |\
uniq |\
sed 's/semicolon/\\;/g;' > grammarchecker-release.cg3
@flammie could you have a look at this? See also the following commit, especially the commit message:
https://github.com/giellalt/lang-smn/commit/06651d37eaae29b919afad8ea73b4cde0863377f
(or the following set of commits: https://github.com/giellalt/lang-smn/compare/4f1951879dcd...16d9d8e84451)
Not sure which of the two approaches are most user friendly when editing a CG file - whatever you choose to do, the overall goal is simplicity for the CG/grammar checker developer.
Programmatically, the goal is to automatically create a derived grammar checker file used for production, as a copy of the development version, but with unfinished rules either commented out or removed. The dev rules should be marked somehow to make the conversion automatic.
@lynnda-hill sending this to @flammie 😄
I wrote a gawk script that handles few different cases of ADD:x rules more. We planned slightly more elegant solution on IRC with potential future CG tooling or otherwise using CG's parser (e.g. vislcg3 --dump-ast
),
Nice. Using the CG tooling somehow seems like a good idea - then the CG file parsing is already in place. Keep in mind that the derived grammar checker / CG file still needs to be debuggable and traceable, preferably with either the generated source file as a reference, or the original source file (whatever is easiest) - also the production version needs to be tested and debugged if needed 🙂
both ways should reserve line numbers and identifiers for tracing luckily.
I moved the script from SMN to Giella-core.
This is now implemented for all Sámi languages with a grammar checker. Closing.
Note that also nob and fao have a grammar checker.
Today the grammarchecker file comes in two shapes,
grammarchecker.cg3
andgrammarchecker-released.cg3
. The setup is not in use: For sme, we work on the latter file, for the other languages we work on the former. What is needed is the following:ADD
rules, e.g. with an initialx
on the rules not fit for releasex
, thereby generating agrammarchecker-released.cg3
file strictly not for editing, containing only the rules marked for publicationThe issue is getting actualised by the upcoming NoDaLiDa conference: In order to write sensible articles we need to focus on a subset of the rules.
For smn, fao, nob the rules are already marked wit x, testing the release procedure thus requires removing the x for one of the rules for testing.