Closed matyaskopp closed 1 year ago
I am a bit sceptical that this makes sense now, still, maybe for those that have not submitted yet (and future generations parlaminters) it will be useful. I didn't use chars.pl, as it is designed a bit differently, but just incorporated the relevant code into validate-parlamint.pl. Didn't test extensivelly, but hope it works. Can't forbid TAB, as it might appear because of XML indent.
This has now been implemented - recently also changed ERROR to WARN in case of bad characters, so validation does not fail if some bad chars are encountered. Closing issue, if other things are to be discussed in relation to this, we can open a new one.
We have scripts for character stats: https://github.com/clarin-eric/ParlaMint/blob/5deaeed5ae792f3ba1726072298885b5b64a6d64/Makefile#L217-L232
But it is not used in the validation procedure.
TODO: extend validate-parlamint.pl with character validation:
NOTE: There is no need to create temporary files. validate-parlamint.pl can contain a list of invalid characters and check if chars.pl output does not contain them