fstirlitz / luaparse

A Lua parser written in JavaScript
https://fstirlitz.github.io/luaparse/
MIT License
459 stars 91 forks source link

Add errors for invalid goto/label structure #72

Closed teoxoy closed 5 years ago

teoxoy commented 5 years ago

closes #71

Also turns on esversion: 6 for jshint. I think it's a non issue, even IE11 has partial support that includes the features I used for this PR (let, const, Map and arrow function).

This project has hardly been very keen at keeping up with the Kardashians latest developments in the JavaScript world.

IMO the project should go trough a conversion stage to take advantage of ES6 features anyway.

I have also tested this PR against the official lua test file for goto.

fstirlitz commented 5 years ago

This squashes multiple unrelated changes into one commit, one of which is a major decision I have not yet decided whether to make, and should be considered in a separate pull request.

Besides, the use of ES6 is completely avoidable here: const and fat-arrow functions are mere syntactic features, and because you only use Map to map integers to integers, it can be replaced with a plain object or a sparse array.

I'm also not sure I like your choice of data structures (and algorithms) anyway. The gotoInfo array is only ever added to, and contains pretty much every declaration, goto and label in the whole script, and is scanned in a nested loop each time a goto needs to be matched against a label, so it looks like it runs in roughly O((local_declarations + labels + gotos)²) time in the worst case. Even in the best case, where there are no labels or goto in the script at all, it runs in O(local_declarations) time.

Plus, your checker accepts this code, which is nonsensical:

function x() goto y end ::y::

I'm definitely not going to merge this as-is. Instead, I might be able to extend my code that addresses #70 to also cover goto and labels.

teoxoy commented 5 years ago

I agree, I should have made a separate issue/PR for es6... I will remove the es6 stuff.

analyzeGotoInfo is only ran once - at the end of the parsing stage. The time is O(n).

When it comes to that function, I will add a check for it.

fstirlitz commented 5 years ago

The loop is only ever run once, but in its body there are calls to filter and findIndex, which take linear time. There's even one sort, which takes O(n log(n)) or O(n²) depending on the implementation; although it's on a filtered list, the worst case is that almost everything passes the filter, so for a rough estimate from above such as this it doesn't matter. So the worst case can be estimated as O(n³), even worse than before.

Also, the loop makes the assumption that one scope index being smaller than another means that the latter scope is contained within the former, which is not correct. Take this:

do
    do
        ::x::
    end
    do
        goto x
    end
end

Your checker considers this valid. If I swap the label with the goto, it rejects the code. PUC-Rio Lua and LuaJIT reject both.

teoxoy commented 5 years ago

I didn't think speed was such an issue... the usage of goto is a novelty imo.

How would you improve this?

fstirlitz commented 5 years ago

Well, for example the Ace editing component calls luaparse to implement on-the-fly syntax checking. It does debounce it, obviously, but it still means that potentially luaparse may run every couple of keystrokes. So yes, speed is kind of an issue. Of course, being an editor based on manipulating HTML DOM makes it horribly inefficient anyway, but when the user's browser inevitably grinds to a halt while processing a sufficiently long script, I don't want it to be my fault.

In particular, I don't want goto-free code to pay the bulk of the price of this feature, though I realise some overhead will be inevitable anyway.

So, how to accomplish this? I'm not entirely sure yet, but I imagine something like this: for each function body (and the top level chunk, but I'll refer to just 'bodies' below) a stack of lists of labels and a list of unresolved gotos is created. Each block pushes a new list of labels on the stack. Every time a block ends, gotos are resolved against labels on the stack and removed from the list of unresolved gotos, while this block's list of labels is popped from the stack. By the time a body ends, generate an error for each unresolved goto still on the list. Just have to take care of the variables as well.

teoxoy commented 5 years ago

What makes this problem inherently difficult is that you can jump forward. So you need information that might be after a goto declaration. This is why I implemented the check at the very end.

So are you going to reimplement this?

fstirlitz commented 5 years ago

You can jump forward, but you cannot jump across function boundaries, and you cannot jump to a descendant block. Which means that by the time a block ends, you already know which gotos pointing to its labels are valid, and by the time a body ends, you already know which gotos point to nowhere. So you can mark gotos as valid as you encounter their corresponding labels, and only remember labels from the current and ancestor blocks.

fstirlitz commented 5 years ago

Well, what I wrote is basically how this should be implemented, all it takes now is to write it in JavaScript instead of English. So yes, probably. I might incorporate your tests, though.

fstirlitz commented 5 years ago

Got something working in my local copy, I expect to commit it this week. As such, I'm closing this one.