cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
94 stars 27 forks source link

regular expression validator? #213

Closed turbomam closed 3 years ago

turbomam commented 3 years ago

I haven't checked carefully yet. Is there a regular expression validator, for something like identifiers or hand-entered ontology IDs?

dehays commented 3 years ago

@turbomam Can you be more specific? Chris has some regex in https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/external_identifiers.yaml for specific identifiers in the pattern slots. Not sure what you are trying to do here.

Might be related to #60

ddooley commented 3 years ago

Data Harmonizer doesn't have a RE string option for validating text or other literals. But it can be useful we know - esp if certain fields specify accession ids that are in a particular format.

turbomam commented 3 years ago

Thanks @ddooley

@dehays I'm going to touch base with you before adding anything else to this issue. Thanks for your input.

ddooley commented 3 years ago

This is now implemented in master via https://github.com/cidgoh/DataHarmonizer/pull/224 , but not in a release yet. One small question. I've implemented it using direct passage of given pattern into new RegExp(field.pattern); This means one must have ^...$ symbols around given regex expression in order to match whole string field content. I presume that's ok with everyone?! E.g. this matches email: ^\S+@\S+.\S+$

A future improvement would be to be able to name commonly used regular expressions, e.g. "email_address" etc.

ddooley commented 3 years ago

Example output from above: image

cmrn-rhi commented 3 years ago

Would it also recognize this email format? damion.d@this.that Or would the additional . throw off the expression?

ddooley commented 3 years ago

There can be many dots on both sides of the fence. Its still super-permissive, maybe too much but official spec allows all sorts of stuff in email addresses.

turbomam commented 3 years ago

Great, I have started using this!