AAVLD-USAHA-ITStandards / eCVI

eCVI Data Exchange Standard (Starting with version 2)
12 stars 9 forks source link

Phone Number Validation #73

Open mkm1879 opened 2 years ago

mkm1879 commented 2 years ago

I've just had a CVI fail because a phone number included the 1 at the beginning. Should we add optional "1" and/or "+1" to the beginning of the phone number regex?

ryanscholzdvm commented 2 years ago

We have phone numbers held up in our system fairly frequently from this same situation where they fail validation from the addition of the 1- especially with toll-free phone numbers. I would support the inclusion of the optional extra digit.

MichaelJRussell commented 2 years ago

This seems to make sense, and would be fairly easy to implement. Is there any reason to limit it to the +1 prefix? That obviously would cover the common case, but would we have situations of international owners of animals moving in the US? For example, an animal owner with contact information in Mexico, housing horses in Texas that may move within the US?

ryanscholzdvm commented 2 years ago

This discussion appears to have been silent for a while. Did the committee want to try to add the allowance for leading digits to the phone number? Doing a bit of research on country codes, it looks like there are actually a couple of different formats, anywhere from 1-3 digits. There are probably a number of ways to structure the phone number to allow for International numbers if desired, but it may be simplest to just allow an optional "1" at the start of the 10 digit phone number and keep it simple for US and Canadian numbers. We could also possibly add in "52" to allow for Mexican prefixes as well. "^(1|52)?(\d{10})$" Additional prefixes could be added if needed.

The other option is to just allow anything from 10-13 digits, but that seems like it would invite errors. "^\d{10,13}$"

ryanscholzdvm commented 1 year ago

Another potential option here would be to split the phone number element into two options here similar to what we are doing with USAddress and InternationalAddress. You could continue to validate a US Phone number strictly, but then allow looser validation in the case that someone selects to use an international phone number, without having to define every possible international phone prefix.

jconlon commented 1 year ago

@ryanscholzdvm like the: "^(1|52)?(\d{10})$" Additional prefixes could be added if needed. option.

jconlon commented 1 year ago

Without going into too much of tangent... take a look at this example

ryanscholzdvm commented 1 year ago

The decision was made at today's meeting to leave the existing Phone number as it currently is, creating a new InternationalPhone element.

mkm1879 commented 7 months ago

I am having trouble validating InternationalPhone Number. The $ at the end of the RegEx is causing it to fail because the value in the attribute does not have an end of line.

Is this a quirk of my editor or an error? This is the only RegEx I find with a terminal $.

jconlon commented 7 months ago

@mkm1879 The $ anchor should detect end of string. This is how we want to test this attribute. In multiline mode it can be used to test for end of line (aka line break) but that is not what we want.

mkm1879 commented 7 months ago

I've only tested with oXygen XML's built in XML Schema RegEx tool so far. Using what is in 3.0

MichaelJRussell commented 7 months ago

I've used regex's often, but not that much in XML validation. Apparently XSD validation implicitly anchors the beginning and ending of the string it's testing, so there's no need for ^ and $ characters in the regex. Ref: https://www.w3.org/XML/2008/03/xsdl-regex/re.xml#id2

At a minimum, it's redundant, but I wonder if that's actually causing the issues with the validation you're seeing @mkm1879 ? I haven't looked that closely at the phone number regex.

mkm1879 commented 6 months ago

I've done a little more digging into this issue while working on my "book." It turns out that the XML schema language defintiion defines its own variation on regular expressions and that definition does not include the anchors ^ or $. So some parsers may just ignore them but others, like mine, tries to match a literal $.

From the standard https://www.w3.org/TR/xmlschema-0/#element-pattern:

D Regular Expressions XML Schema's pattern facet uses a regular expression language that supports Unicode. It is fully described in XML Schema Part 2. The language is similar to the regular expression language used in the Perl Programming language, although expressions are matched against entire lexical representations rather than user-scoped lexical representations such as line and paragraph. For this reason, the expression language does not contain the metacharacters ^ and $, although ^ is used to express exception, e.g. [^0-9]x.