Several transformers try to grab the first hunk of text before a space to determine a date. That's not a great approach if that first hunk of text is too small to be a valid date and also too small to be a good quasi-unique identifier.
In New York, for example, there's an American Airlines entry for "2 /12 /2021" that comes in as simply "2", which could conflict with other bad entries.
If the first hunk is too small to be a date (e.g., 1/1/23 for six characters) the whole string should probably be passed for a match.
value = value.split()[0].replace(",", "").replace(";", "")
Could be something like:
patched= value.split()[0].replace(",", "").replace(";", "")
if len(patched) >= 6:
value = patched
Several transformers try to grab the first hunk of text before a space to determine a date. That's not a great approach if that first hunk of text is too small to be a valid date and also too small to be a good quasi-unique identifier.
In New York, for example, there's an American Airlines entry for "2 /12 /2021" that comes in as simply "2", which could conflict with other bad entries.
If the first hunk is too small to be a date (e.g., 1/1/23 for six characters) the whole string should probably be passed for a match.
value = value.split()[0].replace(",", "").replace(";", "")
Could be something like: