Closed turbomam closed 3 years ago
@turbomam Can you be more specific? Chris has some regex in https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/external_identifiers.yaml for specific identifiers in the pattern slots. Not sure what you are trying to do here.
Might be related to #60
Data Harmonizer doesn't have a RE string option for validating text or other literals. But it can be useful we know - esp if certain fields specify accession ids that are in a particular format.
Thanks @ddooley
@dehays I'm going to touch base with you before adding anything else to this issue. Thanks for your input.
This is now implemented in master via https://github.com/cidgoh/DataHarmonizer/pull/224 , but not in a release yet. One small question. I've implemented it using direct passage of given pattern into new RegExp(field.pattern); This means one must have ^...$ symbols around given regex expression in order to match whole string field content. I presume that's ok with everyone?! E.g. this matches email: ^\S+@\S+.\S+$
A future improvement would be to be able to name commonly used regular expressions, e.g. "email_address" etc.
Example output from above:
Would it also recognize this email format?
damion.d@this.that
Or would the additional .
throw off the expression?
There can be many dots on both sides of the fence. Its still super-permissive, maybe too much but official spec allows all sorts of stuff in email addresses.
Great, I have started using this!
I haven't checked carefully yet. Is there a regular expression validator, for something like identifiers or hand-entered ontology IDs?