Closed jmaebe closed 1 year ago
Here are the call stacks (v3.0.10).
First:
#0 0x00007fff6292e2da in __fread () from /usr/lib/system/libsystem_c.dylib
#1 0x00007fff6292e127 in fread () from /usr/lib/system/libsystem_c.dylib
#2 0x00000001002059b4 in reflex::Input::file_init (this=0x7ffeefbfe640, enc=0) at ../../lib/input.cpp:676
#3 0x00000001001b7b2f in reflex::Input::init (this=0x7ffeefbfe640, enc=0) at /Data/dev/osiris/re-flex/include/reflex/input.h:743
#4 0x00000001001b7ab7 in reflex::Input::Input (this=0x7ffeefbfe640, file=0x7fff98ed30c8) at /Data/dev/osiris/re-flex/include/reflex/input.h:439
#5 0x00000001001b797d in reflex::Input::Input (this=0x7ffeefbfe640, file=0x7fff98ed30c8) at /Data/dev/osiris/re-flex/include/reflex/input.h:438
#6 0x000000010019f0a8 in reflex::AbstractLexer<reflex::Matcher>::in<__sFILE*> (this=0x100265270 <YY_SCANNER>, input=@0x10025a0e0: 0x7fff98ed30c8) at /Data/dev/osiris/re-flex/include/reflex/abslexer.h:131
Second:
#0 0x00007fff6292e2da in __fread () from /usr/lib/system/libsystem_c.dylib
#1 0x00007fff6292e127 in fread () from /usr/lib/system/libsystem_c.dylib
#2 0x00000001002059b4 in reflex::Input::file_init (this=0x7ffeefbfe5f8, enc=0) at ../../lib/input.cpp:676
#3 0x00000001001b7b2f in reflex::Input::init (this=0x7ffeefbfe5f8, enc=0) at /Data/dev/osiris/re-flex/include/reflex/input.h:743
#4 0x00000001001b7ab7 in reflex::Input::Input (this=0x7ffeefbfe5f8, file=0x7fff98ed30c8) at /Data/dev/osiris/re-flex/include/reflex/input.h:439
#5 0x00000001001b797d in reflex::Input::Input (this=0x7ffeefbfe5f8, file=0x7fff98ed30c8) at /Data/dev/osiris/re-flex/include/reflex/input.h:438
#6 0x000000010019f115 in reflex::AbstractLexer<reflex::Matcher>::in<__sFILE*> (this=0x100265270 <YY_SCANNER>, input=@0x10025a0e0: 0x7fff98ed30c8) at /Data/dev/osiris/re-flex/include/reflex/abslexer.h:133
The issue is with this code:
If you parse a
FILE* fh_Input
, then userewind(fh_Input);
followed byyyrestart(fh_Input);
, the above code will result in theInput(FILE *file)
constructor getting called twice (lines (1) and (2)). In both cases, this constructor will callinit()
, which will read the first resp. second byte of the file to check for an UTF* signature usingfread
. This means that the first character of the file gets lost after theyyrestart
(the second character is cached inutf8_[0]
instead of the first one once the actual lexing starts).Interestingly, I've only encountered this issue under macOS and not under Windows. I did not debug it under Windows to see why it works there.
It's a proprietary lexer so I can't share the code, but it has the following options:
and is generated with
reflex --flex --bison