UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
203 stars 43 forks source link

evaluate_treebank.pl "WARNING: Validator not found" #80

Closed ALeczkowski closed 3 years ago

ALeczkowski commented 3 years ago

Hello,

I was hoping to get your help on an issue I have with the evaluate_treebank.pl script. Evaluation goes well and I get results the same as on the https://universaldependencies.org website in the case of many treebanks. However, I get a "WARNING: Validator not found. We will assume that all files are valid." message. So in the case of treebanks that do not score 1 in the validity test I get much different treebank ratings than those posted on https://universaldependencies.org.

I do not know Perl so I was not able to figure out what is the issue here. I would appreciate your help.

PS. validate.py script is present in the directory where it is supposed to be (I think I tried all possible combinations)

dan-zeman commented 3 years ago

PS. validate.py script is present in the directory where it is supposed to be

So how do you determine where it is supposed to be?

The way the script is written, it should not matter where exactly your copy of validate.py is located as long as the folder is in your PATH (the environment variable that specifies where the system should look for executable files). Also, the script should have permissions to be executed by the current user. On a Unix-like system, you could try the command

which validate.py

If it returns the path to your copy of validate.py, then evaluate_treebank.pl should be able to locate it, too.

ALeczkowski commented 3 years ago

Thank you for your reply! I was trying to add the PATH to the folder where I put the script incorrectly (I am not a frequent user of LInux)

I have one more question to the evaluate_treebank.pl script and to the validate.py script.

Running the script evaluate_treebank.pl on, for instance, Polish LFG results in a score much different (very low) than the score presented at the Universal Dependencies' website; it does not pass the validation script. A piece of the output:

[Line 22 Sent dev-1]: [L4 Morpho feature-not-permitted] Feature SubGender is not permitted in language [pl].

I assume that the validate.pl script, which is run within evalute_treebank.py script, should be used with the parameter --level 3. Running it with that parameter gives the same results as shown on the UD's website for the Polish LFG.

Do I understand correctly that you performed the evaluation of all of the treebanks with the following code in lines 393 and 397 of the evaluate_treebank.pl script (not testing the language specific labels and contents): 393 $command = "./validate.sh --lang $lcode **--level 3** --max-err=10 $folder/$file"; 397 $command = "validate.py --lang $lcode **--level 3** --max-err=10 $folder/$file"; and not as it is, that is: 393 $command = "./validate.sh --lang $lcode --max-err=10 $folder/$file"; 397 $command = "validate.py --lang $lcode --max-err=10 $folder/$file";

dan-zeman commented 3 years ago

No. The validator should operate at the default level 5. Note that the validator evolves and the current version may not be the same as the one that was available at the time of the last release, when the last evaluation was performed. This is the case now: The test for feature-not-permitted was not available at the time of release 2.7.

You can see the full evaluation log in eval.log in the master branch of each treebank, e.g. here for Polish LFG. Among other things, it tells you the exact version (commit id) of the tools that were used for the evaluation. If you return to that commit, you should obtain the same result.

dan-zeman commented 3 years ago

it tells you the exact version (commit id) of the tools that were used

Oh. Now that I look at it, the date is too old, it is actually just the version of evaluate_treebank.pl itself, not the whole tools repository, so it does not reflect the version of the validator. I have to look into this and make sure that a more meaningful message is generated. Anyway, you can assume that the latest commit to the tools repository before the evaluation was run (11 Nov 2020) is probably the right one.

ALeczkowski commented 3 years ago

Thanks for your help!