AvailLang / Avail

The Avail programming language. Includes the virtual machine, standard library, and standard examples.
BSD 3-Clause "New" or "Revised" License
54 stars 5 forks source link

Bad custom lexers break... most things #157

Open tdhsmith opened 4 years ago

tdhsmith commented 4 years ago

I'll start by acknowledging that custom lexers obviously live much more towards the "power user side" of the current library capabilities. It's likely an open question whether making them more user-friendly is worth any current effort or not, and whether any misbehavior should attempt to be diagnosed or simply left in undefined behavior territory.


It's not hard to write a bad lexer, but here's a particularly ignorant one:

Lexer $"bad lexer"
when [ x : any | true ]
is
[
    source : string,
    position : real source position,
    line : real source line number
|
    // Always return a "keyword" made up of the *entire* source text.
    // (Don't try this at home.)
    {<keyword (source) @ position:line>}
];

In normal operation (without assertions) attempting to build a module with access to this lexer doesn't necessarily cause issues on its own, but it mucks up the error reporting, causing the compiler diagnostics to crash and surfacing a rather opaque ArrayIndexOutOfBoundsException within an Avail descriptor implementation, from attempting to copy a slice of source text beyond its length.

ArrayIndexOutOfBoundsException crashing workbench
Internal error: 111
java.lang.ArrayIndexOutOfBoundsException: 111
    at com.avail.descriptor.AvailObjectRepresentation.byteSlot(AvailObjectRepresentation.java:276)
    at com.avail.descriptor.ByteStringDescriptor.lambda$o_CopyTupleFromToCanDestroy$1(ByteStringDescriptor.java:424)
    at com.avail.descriptor.ByteStringDescriptor.generateByteString(ByteStringDescriptor.java:532)
    at com.avail.descriptor.ByteStringDescriptor.o_CopyTupleFromToCanDestroy(ByteStringDescriptor.java:423)
    at com.avail.descriptor.AvailObject.copyStringFromToCanDestroy(AvailObject.java:1335)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.reportGroupedErrors(CompilerDiagnostics.kt:439)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.access$reportGroupedErrors(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$reportError$1.invoke(CompilerDiagnostics.kt:559)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$reportError$1.invoke(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.accumulateErrorsThen(CompilerDiagnostics.kt:620)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.access$accumulateErrorsThen(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$accumulateErrorsThen$2.invoke(CompilerDiagnostics.kt:647)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$accumulateErrorsThen$2.invoke(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$Companion.findLongestTokenThen(CompilerDiagnostics.kt:941)
    at com.avail.compiler.problems.CompilerDiagnostics$Companion.access$findLongestTokenThen(CompilerDiagnostics.kt:860)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.accumulateErrorsThen(CompilerDiagnostics.kt:636)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.access$accumulateErrorsThen(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$accumulateErrorsThen$2.invoke(CompilerDiagnostics.kt:647)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$accumulateErrorsThen$2.invoke(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$Companion.findLongestTokenThen(CompilerDiagnostics.kt:941)
    at com.avail.compiler.problems.CompilerDiagnostics$Companion.access$findLongestTokenThen(CompilerDiagnostics.kt:860)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.accumulateErrorsThen(CompilerDiagnostics.kt:636)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.access$accumulateErrorsThen(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$accumulateErrorsThen$2.invoke(CompilerDiagnostics.kt:647)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList$accumulateErrorsThen$2.invoke(CompilerDiagnostics.kt:280)
    at com.avail.compiler.problems.CompilerDiagnostics$Companion.findLongestTokenThen(CompilerDiagnostics.kt:941)
    at com.avail.compiler.problems.CompilerDiagnostics$Companion.access$findLongestTokenThen(CompilerDiagnostics.kt:860)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.accumulateErrorsThen(CompilerDiagnostics.kt:636)
    at com.avail.compiler.problems.CompilerDiagnostics$ExpectationsList.reportError$avail(CompilerDiagnostics.kt:555)
    at com.avail.compiler.problems.CompilerDiagnostics.reportError(CompilerDiagnostics.kt:838)
    at com.avail.compiler.AvailCompiler$tryIfUnambiguousThen$1.invoke(AvailCompiler.kt:394)
    at com.avail.compiler.AvailCompiler$tryIfUnambiguousThen$1.invoke(AvailCompiler.kt:199)
    at com.avail.compiler.CompilationContext$noMoreWorkUnits$3.invoke(CompilationContext.kt:203)
    at com.avail.compiler.CompilationContext$noMoreWorkUnits$3.invoke(CompilationContext.kt:126)
    at com.avail.compiler.CompilationContext$workUnitCompletion$1.invoke(CompilationContext.kt:439)
    at com.avail.compiler.CompilationContext$workUnitCompletion$1.invoke(CompilationContext.kt:126)
    at com.avail.compiler.scanning.LexingState$sam$com_avail_utility_evaluation_Continuation1NotNull$0.value(LexingState.kt)
    at com.avail.AvailTask.lambda$forFiberResumption$2(AvailTask.java:160)
    at com.avail.AvailRuntime.lambda$whenLevelOneUnsafeDo$3(AvailRuntime.java:1632)
    at com.avail.AvailTask.run(AvailTask.java:259)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

With assertions, the lexer tends to fail during its operation. In my case it tended to trip up LexingState on an assertion that its position was not beyond the end of the source (below).

AssertionError in lexing state
Unexpected internal failure in AvailTask:
java.lang.AssertionError: Assertion failed
    at com.avail.compiler.scanning.LexingState.withTokensDo(LexingState.kt:206)
    at com.avail.compiler.AvailCompiler$Companion.skipWhitespaceAndComments(AvailCompiler.kt:4611)
    at com.avail.compiler.AvailCompiler$Companion.skipWhitespaceAndComments(AvailCompiler.kt:4567)
    at com.avail.compiler.AvailCompiler$Companion.access$skipWhitespaceAndComments(AvailCompiler.kt:4160)
    at com.avail.compiler.AvailCompiler.attemptToConsumeToken(AvailCompiler.kt:1488)
    at com.avail.compiler.AvailCompiler.parseRestOfSendNode(AvailCompiler.kt:1285)
    at com.avail.compiler.AvailCompiler.access$parseRestOfSendNode(AvailCompiler.kt:199)
    at com.avail.compiler.AvailCompiler$eventuallyParseRestOfSendNode$1.invoke(AvailCompiler.kt:4100)
    at com.avail.compiler.AvailCompiler$eventuallyParseRestOfSendNode$1.invoke(AvailCompiler.kt:199)
    at com.avail.compiler.CompilationContext$workUnitCompletion$1.invoke(CompilationContext.kt:395)
    at com.avail.compiler.CompilationContext$workUnitCompletion$1.invoke(CompilationContext.kt:126)
    at com.avail.compiler.CompilationContext$workUnitsDo$1.value(CompilationContext.kt:489)
    at com.avail.AvailTask.run(AvailTask.java:259)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

However I suspect this is hardly the only assertion a malicious lexer could uncover; I have not experimented much in detail on that front.


Again, since this is the deep end, the impact of these problems is very low. If you're playing with lexers you probably know you broke it. But there are a few philosophical discussions here that I think might shape development goals: is there any valid recovery when the system realizes a lexer isn't doing its job? And more generally, what is the level of responsibility that, say, the workbench takes in trying to prevent arbitrarily bad code from breaking the host process?

toddATavail commented 4 years ago

I wrote this on April 1, but it bounced, so here it is directly via the web form.


We have a hard and fast goal for errors with the project — an internal error / assertion failure should never be produced. Put another way, except in the cases that a user adds new code at the implementation level (Java/Kotlin) or invokes arbitrary pojo operations in an unsound way, the user should never encounter a hard crash from the compiler or the virtual machine, no matter what weird thing they tried to do in their Avail program.

Lexers are an advanced feature, but nevertheless, it is not appropriate for them to just crash the system when misused. I experienced a great deal of pain building the library lexers, because the system is not very graceful when you make a mistake.

This is definitely an area that needs extensive hardening and improvement. Because the number of users building lexers is low, however, and the capacity of the team is limited, this work unfortunately can't receive a great deal of priority right now. We should definitely revisit once the ongoing Kotlinization effort is complete.

I'll leave it to Mark to weigh in if he has other opinions.

Thank you for making such a thorough report!