Boteval / compare-classifiers

A library for juxtaposing classification performance over a given dataset
Eclipse Public License 1.0
1 stars 0 forks source link

When the id header is wrong/missing, a de-duplication error message is issued #5

Open matanox opened 7 years ago

matanox commented 7 years ago

As of the current code-base, in case the object-id header indicated in the mapping file, is absent from an input file, the following error will show rather than a correctly indicative one, because all items will assume the same empty or nil id:

warning: could not de-duplicate ― for the same id , some data rows defer in content

The problem is that checking and sanitizing duplicate ids is called without or before any assertion that the object id column is present in the input files.

This may be part of a larger issue of when or how are the column headers asserted during the flow of input processing. So should be fixed as part of a review in that wider scope, to avoid adding further faulty complexity.

In general, input validation has not received sufficient testing for appropriate error messaging, which is imperative for usage by a wider audience.