enasequence / sequencetools

Webin sequence validation API.
Apache License 2.0
10 stars 3 forks source link

Values missing in error messages (template unfilled)? #12

Open peterjc opened 7 years ago

peterjc commented 7 years ago

e.g.

$ java -jar embl-api-validator-1.1.146.jar test.embl 

***MESSAGES SUMMARY***
Compressed messages (occurring more than 5 times)
INFO: organism classified. Recruiting translation table from taxonomy. Using /transl_table="{0}". (4696 occurrences) (CDSTranslator-11) 
WARNING: Features sharing the locus_tag "{0}" have locations overlapping with locus_tag "{1}". (643 occurrences) (LocusTagCoverageCheck) 
ERROR: Sequence contains a stretch of 'n' characters between base {0} and {1} that is not represented with a "gap" feature (stretches of n greater than {2} gives a warning, greater than {3} gives an error). (26 occurrences) (SequenceToGapFeatureBasesCheck-1) 
...

Notice the apparent error message template markers {0}, {1}, {2}, and {3}.

In this case I presume given this was a bacteria, the first INFO line should have been:

INFO: organism classified. Recruiting translation table from taxonomy. Using /transl_table="11". (4696 occurrences) (CDSTranslator-11) 

However in general, the error messages are made overly cryptic due to the missing values.

kethireddy commented 7 years ago

If the same error message repeats more than 5 times, validator compresses the error messages.
Detailed error messages should be written in report files (VAL_ERROR,VAL_INFO,VAL_SUMMARY) in the validator execution directory.

peterjc commented 7 years ago

OK, but many of these messages were identical. This means the "compression" has replaced a useful error message with a cryptic one:

INFO: organism classified. Recruiting translation table from taxonomy. Using /transl_table="{0}". (4696 occurrences) (CDSTranslator-11) 

could have been:

INFO: organism classified. Recruiting translation table from taxonomy. Using /transl_table="11". (4696 occurrences) (CDSTranslator-11) 
raskoleinonen commented 6 years ago

Perhaps if the number of different error messages for a given error message with variable placeholders would be <= 10 (or some other small N) then we would show the actual error messages with their counts.

Otherwise we would show the error message with the variable placeholders with total counts (as now).