bwrrp / xspattern.js

XML Schema pattern (regular expression) engine
MIT License
10 stars 2 forks source link

Patterns with very large minoccurs crash #11

Open DrRataplan opened 4 years ago

DrRataplan commented 4 years ago

The following test from the QT3 test set that FontoXPath uses causes a crash:

 <test-case name="cbcl-matches-038">
      <description> test a large exact quantifier </description>
      <created by="Tim Mills" on="2008-07-17"/>      
      <test>fn:matches('aaa', 'a{2147483647}')</test>
      <result>
         <assert-false/>
      </result>
   </test-case>

It runs out of memory when compiling the huge program for the a{2147483647} pattern. We could try to detect these patalogically large minoccurs, compile the as if they were a+ and go over the whynot traces to see whether our minOccurs matches?

devatwork commented 4 years ago

Another common optimization is to compute the minimum length of any valid input, e.g. 2147483647 in the provided example, during parse. If the length of the input, aaa is less than the minimum required input there is no way it can match so we shouldn't even attempt to do so.

devatwork commented 4 years ago

See https://github.com/dotnet/runtime/issues/1349 and https://github.com/dotnet/runtime/pull/1348 for some recent optimizations made to the .NET Regex engine. Most of them are irrelevant but there may be some insight in those issues.