frictionlessdata / tableschema-js

A JavaScript library for working with Table Schema.
http://frictionlessdata.io/
MIT License
82 stars 27 forks source link

Support for more flexible error handling #139

Open am17torres opened 6 years ago

am17torres commented 6 years ago

I am trying to surface validation errors to my users when they upload a .csv file that does not conform to the rules defined for that template.

The current implementation of validation errors are okay for now, but there are a few use cases that I would like to be considered:

  1. Allow users to customize error messages for a particular error type. This should also incorporate string interpolation for things like row, column, field_name, constraint key, and constraint value so users could build the appropriate message. It would be nice if all of this information was available in the TableSchemaError class. So that the caller could use that information to organize errors into logical clusters. This was my attempt at organizing errors by field but it's pretty hacky (and doesn't work at all for the unique constraint because of bullet point 3)
const table = await Table.load(fileContent, { schema: rules })

const iterator = await table.iter({ keyed: true, extended: true, forceCast: true });
let errors = []
for(;;) {
      const iteration = await iterator.next();
      if (iteration.done) {
        break
      };
      const { value } = iteration
      if (value instanceof TableSchemaErrors.TableSchemaError) {
        const rowErrors: TableSchemaErrors.TableSchemaError[] = value.multiple ? value.errors : [value]
        for (const error of rowErrors) {
          const { message, rowNumber: row, columnNumber: col} = error
          const field = col ? table.headers[col-1] : 'global'
          const errorMessage = `${message} row "${row}"`
          if (!(field in errors)) {
            errors[field] = []
          }
          errors[field] = errors[field].concat(errorMessage);
        }
      }
    }
  1. Support for multiple languages. It would be nice if the tableschema library used IDs rather than strings so it would be easier to dynamically pick a message based on the users I18n an l10n preferences.

  2. Increased consistency to the data available for an error. For example, the unique constraint lacks a columnName. I can submit a pr for this if you think it makes sense.

// Check unique
      if (cast) {
        for (const [indexes, cache] of Object.entries(uniqueFieldsCache)) {
          const splitIndexes = indexes.split(',').map(index => parseInt(index, 10))
          const values = row.filter((value, index) => splitIndexes.includes(index))
          if (!values.every(value => value === null)) {
            if (cache.data.has(values.toString())) {
              const columnNumber = this._headers.indexOf(cache.name) // <------- I considered adding 
              if (columnNumber > -1) error.columnNumber = columnNumber // <----- to set the missing column
              const error = new TableSchemaError(
                `Row ${rowNumber} has an unique constraint ` +
                `violation in column "${cache.name}"`)
              error.rowNumber = rowNumber
              if (forceCast) return error
              throw error
            }
            cache.data.add(values.toString())
          }
        }
      }