dworkin / dgd

Dworkin's Game Driver, an object-oriented database management system originally used to run MUDs.
https://www.dworkin.nl/dgd/
GNU Affero General Public License v3.0
103 stars 31 forks source link

Array lvalue error #46

Closed nyankers closed 3 years ago

nyankers commented 3 years ago

Running the following command causes a server crash on Cloud Server as admin:

code ({ a, b }) = ({ 1, 2 })

The error message is:

Fatal error: bad stack pointer after function call

It looks like this expression in interpreter.cpp is always popping the right-hand array, even when codegen.cpp's expr() method says the resulting operation should not be popped, so when you have a return statement (as above) or an lvalue (x = ({a, b}) = ({1, 2}) also fails, for instance), I assume it is trying to use and then pop that array again. My analysis could be wrong though, since so far I'm not able to resolve this.

nyankers commented 3 years ago

It looks like this was the cause after all, though confusingly just checking I_POP_BIT is not sufficient to resolve this (attempting to do so caused Cloudlib to fail), but passing the pop value from expr() in the byte does work.

It limits the number of lvalues possible, but while checking this, I found that there's currently no limit, eg. if you have 256+ left-hand variables in a ({a, b}) = x style expression, it starts showing very unusual behavior.

As a proof-of-concept solution, I added a 127 limit here (it'll error out using Compile::error() if this is exceeded) and used the 128-bit to pass along whether the array should be popped or not. I'll submit a pull request for reference. :)

dworkin commented 3 years ago

Good work. Two separate bugs, attempting to use the value of the aggregate assignment and the 1 byte counter issue.

nyankers commented 3 years ago

At a glance, I'd guess JIT works the same way as it also always pops the value, unless there's some safety mechanism there that interpret.cpp doesn't have... however, I'm not altogether sure how to test JIT, or else I'd offer to patch it as well.

Though if you think x = ({a, b}) = y type structures are invalid, failing to compile them might be easier. The only language I know of that uses them all that much seems to support them (it's Javascript), but it's a pretty unusual use-case outside of things like code snippets where you default to handling the return value to print it to the user.

dworkin commented 3 years ago

I think of ({ a, b }) = ({ 1, 2 }) as a shorthand (which happens to be implemented in a particular way for performance reasons), so it isn't all that clear what the value of it as an expression should be. The array? Or the value of the last assignment?

I certainly always thought of it as something that should not have a value.

dworkin commented 3 years ago

I appreciate the work you're doing, but I will implement the fixes myself, and then discuss them in this space.

nyankers commented 3 years ago

That's fine! It's really just meant to highlight the problem and help me understand the code base along the way, and besides I still have no idea how to test any changes to JIT. I also wrote this without any understanding what I_STORES|I_POP_BIT was popping, though I think I get it now (the integer return value of sscanf() is my guess, the I_POP_BIT getting put on I_STORES rather than the kfun call itself). I don't see very many use cases for it being a value anyway, besides read-eval-print functions like cloudlib's code command.

I could see some pattern like if (!(({a, b}) = c)) { /* handle c == nil, and maybe sizeof(c) < 2 *? } being neat. The downside is this would invariably require removing any safeties on ({a, c}) = c, forcing the coder to implement them instead. As such, this can already be handled by doing if (catch(({a,b}) = c)) { /* ... *? } but I'm not sure how expensive catching errors is.

Actually, looking over at cloudlib's /usr/System/sys/errord, it's not all that cheap there... I might want to modify how that works a bit more in my environment, haha.

As for the one other language I know where this is a common feature, Javascript just passes the original array along just as the patch I wrote does, eg. ({a, b, c}) = ({d, e}) = ({1, 2, 3, 4}) (and its javascript equivalent) would have a and d at 1, b and e at 2, and c at 3, with the 4 getting dropped. The argument I could see for it is that it meets the general expectation that (a = b) will have the same value as b, e.g. ({a}) = ({b}) = ({c}) will work, as well as f(({a}) = ({b})) and so forth.

shentino commented 3 years ago

I think of ({ a, b }) = ({ 1, 2 }) as a shorthand (which happens to be implemented in a particular way for performance reasons), so it isn't all that clear what the value of it as an expression should be. The array? Or the value of the last assignment?

I certainly always thought of it as something that should not have a value.

Standard assignment semantics would seem to favor preserving C/C++ behavior, and in my opinion the aggregate assignment to the lvalue elements is the exception, so I guess it could be like a bus that makes a pit stop over the variables inside the array, drops off the values, and then keeps driving towards the final lvalue before parking. Presumably if the "head" lvalue in the statement is itself a variable it could also preserve reference semantics for the array at the tail end of the assignment chain.

As for the javascript example nyankers cited my first hunch is that it would cause an array size mismatch. I confirmed this and it returned a "wrong number of lvalues" error.

Attempting to confirm this bug on my mud caused it to crash and that's my fault for monkeying with production code :P. Lesson learned, but it does confirm that this "reuse the syntactic sugar" issue does in fact exist and causes an abort.

shentino commented 3 years ago

More specifically, ({ a, b }) = \<somearray> IMO should itself have the value of \<somearray>, just as if it was \<somearrayvar> = \<somearray>, since the splitting up into separate values for assignment to individual lvalue elements is a DGD specific elaboration.

Though also I just thought of this:

What if you wanted to do:

({ a, b }) = ({ x, y }) = SomeFunctionThatReturnsA2ElementArrayAndMaybeKeepsAReference()?

And basically iterate the shorthand in cases where you needed to make multiple spreads of the same array?

Edit: forgot to escape the angle brackets

Edit 2: forgot to illustrate that the array provided in the second example may already be referenced elsewhere before being returned by the function

dworkin commented 3 years ago

I've committed my changes, please have a look.

dworkin commented 3 years ago

e719975014c343c39e4412c2161e0b14a5609fd5: aggregate assignment has no value as an expression 627d20cc865745a56531c389b4663678b4b7c2ad: I_STORES takes 2 bytes as an argument

nyankers commented 3 years ago

Looks like it's resolved! I was slightly horrified at the idea of storing sixty thousand lvalues, but the stack apparently does not like going that deep in the first place, so it will fail long before the new counter can overflow (though I'm also personally horrified at storing a hundred of them, I guess it could happen in some weird LPC code generation...)

Edit to clarify: fail to compile, not a runtime failure.

nyankers commented 3 years ago

As an aside, anyone taking this update would probably have to recompile stuff as the old single byte argument code won't work anymore. Personally I just upgraded /kernel/sys/driver and /kernel/lib/auto and called it good, but before I realized this I was wondering why my newly configured JIT (thanks to the other issue being resolved) was just choking and dying.

dworkin commented 3 years ago

The STORES argument also had to be extended to 16 bits in the extension interface, that has now been fixed.