Open tznind opened 3 years ago
There's a RegexDict package heading for Nuget soon that should help there - conceptually a sort of Dictionary<Regex,T> that knows to turn "^foo$" into a simple dictionary entry internally plus some other tweaks. Probably a week or so from release.
Cool. I think we have a lot of patterns that are generated just with Regex.Escape(...)
so might also need to deal with \
(literal space) \(
and \.
etc (if possible). I can pull a sample of from the ignore rules if that would help.
Samples would be handy - I'm starting with the simplest three cases, ^foo$, ^foo and foo$ (dictionary, prefix tree and suffix tree respectively), with and without case sensitivity, next up probably turning alternates like (foo|bar) into two entries for foo and bar. Easy to extend support later though, I'll probably focus on the basic cases first.
Looks like you can use Regex.Unescape(String)
Here are some samples:
^MR\ -\ LWS$
^PELVIS\^BG\ -\ Bony\ Pelvis$
^MRI\ BRAIN\ \ WITH\ GENERAL\ ANAESTHETIC$
^MR\ GINOCCHIO\ DX$
^CNPM$
^0000009510$
^MABDO\nMWSPN$
^MLSPC\nMAPEL$
^MHACR\nMHANR$
^MLLCR\nMLOLR$
^XZSCANCDFR$
^MCOWI\nMSKCH$
^MCOWI$
^MCVVS\nMCORV$
^MTHCR\nMTHIR$
^MSKCH\nMNECK$
^MR\ SLDI\ SPINE$
^MCERA$
^MWRCR\nMWRIR$
^MJHHIR$
^MREV$
^MRST$
^MCHEC\nMCHES$
^ZMSKSPNC$
^MCSPC\nMSCTH$
^MCSPN\nMSCTHC$
^MJHSHR$
^MRGK$
^MSHLR\nMELBR$
^ZFNMRN$
^MNECK\nMNECC$
^MAPEL\nMAPEL$
^MADRC$
^MHACL\nMHACR$
^ITSA$
^ZMSKSPN$
^MHCM$
^MRIHBMT$
^DICOM$
^MRIBR$
^MRABDOMEN$
^MRNC$
^MR\ pelvis$
^LSWO$
^IAVC$
^MRI\ KIDN$
^MRKO$
^8CH\ POST\ OP\ FOLL$
^BRAINSPINE$
^NASH_COMPLETE_PR$
^CBT_V11$
^MR_\ BEYIN$
^PELVISLOWEREXTR$
^IRM\ RETROPERITOI$
^8CH\ 72HR\ WYETH\ S$
^8CH\ 30DAY\ WYETH$
We should look into ways to improve performance currently there are lot of exact match Regular expressions e.g.
^bob$
(in field x). These should be grouped together and simplified into a hashset of strings so we can do fast matching of values into this list rather than sequentially applying every rule/regex one after the other.