elm / regex

If you really need regex in Elm, it is possible.
https://package.elm-lang.org/packages/elm/regex/latest/
BSD 3-Clause "New" or "Revised" License
33 stars 9 forks source link

JavaScript heap out of memory error with empty string regex #6

Open tesk9 opened 6 years ago

tesk9 commented 6 years ago

Splitting on a Regex from an empty string ("") leads to a page crash.

> import Regex
> everyCharacter = Maybe.withDefault Regex.never (Regex.fromString "")
{} : Regex.Regex
> Regex.split everyCharacter ""

<--- Last few GCs --->

[14137:0x104002a00]     6566 ms: Mark-sweep 577.3 (584.5) -> 577.3 (581.5) MB, 292.9 / 0.0 ms  (average mu = 0.474, current mu = 0.000) last resort GC in old space requested
[14137:0x104002a00]     6860 ms: Mark-sweep 577.3 (581.5) -> 577.3 (581.5) MB, 294.5 / 0.0 ms  (average mu = 0.307, current mu = 0.000) last resort GC in old space requested

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1ca8ebd5c01d]
Security context: 0x1cf1ef21e681 <JSObject>
    1: push [0x1cf1ef2057f1](this=0x1cf1e3004a49 <JSArray[75209227]>,0x1cf12c0029f1 <String[0]: >)
    2: /* anonymous */(aka /* anonymous */) [0x1cf1e3004a69] [/Users/tessakelly/Documents/elmoji-translator/elm-stuff/0.19.0/temp.js:~861] [pc=0x1ca8ebded7f2](this=0x1cf12c0026f1 <undefined>,n=0x1cf1e302d889 <Number inf>,re=0x1cf1e3004891 <JSRegExp <String[4]: ...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 0x10003907e node::Abort() [/usr/local/bin/node]
 2: 0x10003924f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x10019064b v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x1001905ec v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0x10043fdb4 v8::internal::Heap::UpdateSurvivalStatistics(int) [/usr/local/bin/node]
 6: 0x1004465d2 v8::internal::Heap::SetUp() [/usr/local/bin/node]
 7: 0x10042685d v8::internal::Factory::AllocateRawArray(int, v8::internal::PretenureFlag) [/usr/local/bin/node]
 8: 0x10042625d v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::Heap::RootListIndex, int, v8::internal::Object*, v8::internal::PretenureFlag) [/usr/local/bin/node]
 9: 0x1003e58f2 v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2> >::ConvertElementsWithCapacity(v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::FixedArrayBase>, v8::internal::ElementsKind, unsigned int, unsigned int, unsigned int, int) [/usr/local/bin/node]
10: 0x1003e57a5 v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2> >::GrowCapacityAndConvertImpl(v8::internal::Handle<v8::internal::JSObject>, unsigned int) [/usr/local/bin/node]
11: 0x1003e43fc v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2> >::Add(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, unsigned int) [/usr/local/bin/node]
12: 0x10050ade4 v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::ShouldThrow) [/usr/local/bin/node]
13: 0x10061a73d v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::LanguageMode) [/usr/local/bin/node]
14: 0x10061d6db v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
15: 0x1ca8ebd5c01d

I think I would expect for the behavior to match Regex.never:

> import Regex
> Regex.split Regex.never ""
[""] : List String
Herteby commented 5 years ago

I just found this out too, here's an Ellie repro: https://ellie-app.com/4wHwRvs79mSa1

perkee commented 2 months ago

This also affects splitting on word boundaries with Maybe.withDefault Regex.never (Regex.fromString "\\b"). The issue is in the while loop in _Regex_splitAtMost: start = re.lastIndex; assumes that start will increase by a nonzero amount, but re.lastIndex is zero in this case. For what it's worth this works with the native JS implementation 'word-kebab'.split(/\b/) produces [ 'word', '-', 'kebab' ]. My thanks to @brian-carroll for finding this. @lydell also pointed out that per ECMA the pointer should move forward by AdvanceStringIndex when the match has zero width, which basically advances the pointer by one then keeps going to the end of the code point. Of course, we're not making an ECMA compliant language, but it's a good reference for reimplementing regex split matching.