Filosoft / vabamorf

Eesti keele morfanalüsaator
Other
24 stars 14 forks source link

Lexicon compiler outputs an unhelpful error message if lexicon contains a syntax error #21

Open Kaljurand opened 9 years ago

Kaljurand commented 9 years ago

Steps to reproduce:

  1. Add the line "sage:0905|" (or another syntactically incorrect entry) somewhere into "dct/data/mrf/fs_lex"
  2. Run nullist-uus-sonastik.sh
  3. After a while the script stops and prints "Ei leia , märki" to the console

It would be better if every (or also just the first) line that contains a syntax error is output along with the line number and lexicon source file name. This would make locating the errors computationally tractable.

While the lexicon source distributed with Vabamorf do not contain any errors, lexicons automatically generated from external resources most likely will (during the development of conversion scripts).

merisiga commented 9 years ago

I think this is a non-issue. A standard, normal way of adding new entries to the lexicon would be the following:

  1. Choose the tüüpsõna (example word?) (kõne) for the one you want to add (ruse).
  2. Take the dictionary entry for kõne and modify it to be suitable for ruse by making some stem changes in the entry.
  3. Add the new entry. You see, there is not much chance for syntactic errors here.
Kaljurand commented 9 years ago

The use case that I have in mind is somebody developing a converter from a more human readable lexicon format to the Vabamorf lexicon format. It would be good then if the lexicon compiler could provide more detailed feedback. (Ideally there could be an API for adding entries dynamically at runtime, and not from strings but structured objects, but that's another issue.)

But use cases are unpredictable. My main argument is that "Ei leia , märki" is simply a very bad error message. A compiler should say more than just ERROR, especially if it parses line by line so that reporting line numbers is straightforward. So I do find that this is an issue, one that should be resolved in long term at least.