It was discussed in person whether sanitization of input needed to occur, and to what extent, by the importers. To avoid making too many assumptions and to keep the control of standards in the hands of each data curator, we instead don't want to sanitize input, but provide feedback at the validation stage IF the input could cause potential confusion (for example, Plant Height vs. plant height could be considered a duplicate). Given feedback that a value might be too similar to another value already in the database, the curator can make their own judgement on whether the original value needs to change or the current input.
Design
To accomplish this, we propose a new validator to do the following checks on the specified input (for the Traits Importer specifically, we think this would apply to Trait Name, Method Short Name and Unit since this combination forms a unique trait):
Possibly check for invalid characters. Examples include: whitespace before/after, quotes, underscores, symbols such as /, @, %, etc...
The validator could accept an array parameter of invalid chars?
A "loose" search in the database for possible duplicates of cvterms (ex. specify in the query to search as case insensitive, maybe so far as allow 1 or 2 char mismatches)
Branch
Proposed branch name: g4.80-sanitizedInputValidator
Groups
Group 4 - API | Services | Plugins
Describe
It was discussed in person whether sanitization of input needed to occur, and to what extent, by the importers. To avoid making too many assumptions and to keep the control of standards in the hands of each data curator, we instead don't want to sanitize input, but provide feedback at the validation stage IF the input could cause potential confusion (for example, Plant Height vs. plant height could be considered a duplicate). Given feedback that a value might be too similar to another value already in the database, the curator can make their own judgement on whether the original value needs to change or the current input.
Design
To accomplish this, we propose a new validator to do the following checks on the specified input (for the Traits Importer specifically, we think this would apply to Trait Name, Method Short Name and Unit since this combination forms a unique trait):
/
,@
,%
, etc...