clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
50 stars 53 forks source link

validate-parlamint.pl: stderr for warnings and errors, stdout for the rest, please #863

Closed BartJongejan closed 7 months ago

BartJongejan commented 7 months ago

In the output from validate-parlamint.pl, I oversaw a single file that did not validate. I did the following:

perl validate-parlamint.pl ../Schema/ /mnt/c/gitprojects/ParlaMint/Samples/ParlaMint-DK > perllog 2>perlerr &

However, perllog remains at zero bytes, and all output is sent to perlerr. It would be nice to have the good stuff being sent to stdout. Alternatively, a summary at the end of the output could tell how many INFO, WARNING and ERROR is in the output, so the user knows what to look for in a textual search through the output. (I was not sure what to look for, so I did just a quick scroll through the thousands of output lines in the hope of finding abberant output lines.)

TomazErjavec commented 7 months ago

However, perllog remains at zero bytes, and all output is sent to perlerr.

Yes, this is intentional, i.e. all ParlaMint scripts send info/warning/error messages to STDERR. I don't really see a problem with that, in fact, it seems better than some scripts (those that actually write their output to STDOUT) sending it to STDERR, while others (like validate-parlamint.pl) sending it to STDOUT. It would just mean "catching" error messages passed between scripts becomes harder.

It would be nice to have the good stuff being sent to stdout.

Well, there is not "good stuff" as such - if everything is perfectly ok, there are just a lot of INFO messages. It would also not be a good idea to separate INFO (STDOUT) from WARN and ERROR (STDERR), because the purpuse of the INFO messages is to give you a context where the error / warning occured - withtout them you could only guess where the validations went wrong.

Alternatively, a summary at the end of the output could tell how many INFO, WARNING and ERROR is in the output, so the user knows what to look for in a textual search through the output. (I was not sure what to look for, so I did just a quick scroll through the thousands of output lines in the hope of finding abberant output lines.)

This would be technically hard to do, as it would mean that instead of just outputting something on STDERR, the top-level program would need to store all STDERR messages and, at the end, count all their types and output them + summary. Note that there is also more than one "top-level program" - in this case it was validate-parlamint, but this one in turn is called e.g. by parlamnt2distro, so, which one would do the summary here? It seems a lot of extra complications to get to a summary number - I just use grep -i error over the log. But it is true that this could be documented somewhere.

Anyway, these are my reactions, maybe @matyaskopp has a different perspective?

matyaskopp commented 7 months ago

Agree with @TomazErjavec.

I have one technical point, when you print both to STDERR and STDOUT various programs behave differently. They usually cache the output internally, and when the output is large enough, it is flushed to the stream, so the different order of flushing to the STDERR and STDOUT can make merging impossible - it is then safer to flush everything to one stream and then split it afterwards.

BartJongejan commented 7 months ago

OK, if every kind of error is found by grepping "error" (Lower case? That's odd, if infos and warnings are shouted in capitals.) then that's fine. Provided it is documented, as @TomazErjavec proposes. Are there other types of notifications to be aware of?

TomazErjavec commented 7 months ago

if every kind of error is found by grepping "error"

Yes, exactly.

(Lower case? That's odd, if infos and warnings are shouted in capitals.)

Our ERRORs are shouted in capitals too, but we have no control over errors that are reported by jing and UD validator - there they are unfortunatelly in lower case, hence "grep -i error" which catches both.

Provided it is documented, as @TomazErjavec proposes.

Yes, we need to think about where to mention this, as there are many options....

Are there other types of notifications to be aware of?

Well, you also have FATAL ERROR, where a script dies. But the grepping over -i error will catch these as well. So, in short, no, this is it.

Closing this now, with a reminder to ourselves to explain the 3 types or errors somewhere.