Open otac0n opened 13 years ago
Does using RegexOptions.ECMAScript help?
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions(v=VS.100).aspx
@hakanson: We are already using the ECMAScript option, which works well for the most part. It is just this little piece that is different.
I think this is something we'll have to live with for now, doing a custom regular expression implementation for this small detail is too much for too little gain currently. I'll leave the ticket open, and we'll look into it eventually.
-1 for me for not looking in the code in Core.fs
let options = (options ||| RegexOptions.ECMAScript) &&& ~~~RegexOptions.Compiled
let key = (options, pattern)
this.RegExp <- env.RegExpCache.Lookup key (fun () -> new Regex(pattern, options ||| RegexOptions.Compiled))
I'm new to F#; does this mean you are implementing your own compiled RegExp cache? I ask because there is a Regex.CacheSize Property that controls an internal cache of compiled regular expressions. I assume it gave you more control to have your own cache, but thought I would add for completeness (as the risk of looking uninformed a second time on the same issue).
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.cachesize.aspx
Yes we do maintain our own regexp cache, we found it to be faster actually.
We found that in a loop like this...
while (true)
{
var r = new RegExp("...");
}
...that .NET's regex cache was not helping.
When we implemented the regexp cache shown above, we saw a 50% reduction in the time on the SunSpider regexp test.
@otac0n - From the looks of it the BCL only caches for static methods on the Regex object so the increase in performance makes sense.
For regular expressions such as this: ((a+)?(b+)?c+)*
There are 3 capturing groups (one for each left-parenthesis).
If this is matched against a string like the following: bbbccaac
The .NET implementation will list the following capture groups: ((a+)?(b+)?c) = "aac" (a+) = "aa" (b+) = "bbb"
Whereas the ECMAScript spec specifies the following capturing behavior: ((a+)?(b+)?c) = "aac" (a+) = "aa" (b+) = undefined
The .NET implementation gives no indication that the
(b+)
capturing group did not participate in its most recent match attempt.