While matching a URL embedded in CSS as url(...) escaped with 8192 single quotes before and after the ExtractingParseObserver causes a stack overflow. See wat_wet_stack_overflow_test.warc.gz for the problematic WARC record.
java.lang.StackOverflowError
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Ques.match(Pattern.java:4182)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Ques.match(Pattern.java:4182)
... (16000 lines stripped)
at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
at java.util.regex.Pattern$Start.match(Pattern.java:3461)
at java.util.regex.Matcher.search(Matcher.java:1248)
at java.util.regex.Matcher.find(Matcher.java:637)
at java.util.regex.Matcher.replaceAll(Matcher.java:951)
at org.archive.resource.html.ExtractingParseObserver.patternCSSExtract(ExtractingParseObserver.java:485)
at org.archive.resource.html.ExtractingParseObserver.handleStyleNode(ExtractingParseObserver.java:233)
While matching a URL embedded in CSS as
url(...)
escaped with 8192 single quotes before and after the ExtractingParseObserver causes a stack overflow. See wat_wet_stack_overflow_test.warc.gz for the problematic WARC record.