gamesys / moonshine

A lightweight Lua VM for the browser
http://moonshinejs.org
MIT License
502 stars 35 forks source link

string.match -- allow matches of square brackets as magic characters #29

Closed markbrockington closed 7 years ago

markbrockington commented 9 years ago

Removed some problematic code that was preventing matching of square brackets as magic characters.

paulcuth commented 9 years ago

Hi Mark,

Looking at the tests that you've added, the issue seems to be that '%[' is translated to a JavaScript RegExp that is invalid and causes an error to be thrown. Is that correct?

Removing that block of code does fix your issue, but the code is there for a reason and removing it breaks other patterns. There is a secondary issue here regarding the lack of tests for string.match(); if they were present, it might have been more apparent that for loop is required. I'll open a separate issue for that.

I believe the code you removed addresses the issue of unescaped brackets within char-sets. Having a char-set inside a char-set is not valid, therefore the second opening bracket can only be treated as a bracket and doesn't require escaping.

print( string.match( '[4,3]', '[[]' ))    --> [

While this is also the case in JavaScript, there was a browser--I don't remember which one now--that didn't like the unescaped second bracket meaning that we had to escape it explicitly. I'll try to find which browser it was.

However, it seems that neither the existing code nor this pull-request addresses all the issues around this well. I'll add some tests today and, if you fancy the challenge, would welcome an updated PR.

Paul.

paulcuth commented 9 years ago

Hi Mark,

Although I clearly remember having a test case when writing that code, I now (a) can't find a single commit, bug report or email that sheds light on what that use case was, and (b) I can not repeat any errors in any current browser. It's possible that whatever issue there was with the translated RegExp, it was fixed at the browser end. Either way, I'm confident enough and happy to merge this PR.

I've added tests for more use cases related to brackets in patterns, some of which fail in Moonshine (not in C Lua). Therefore I'm going to keep this on a branch (pattern-fixes) until all tests pass. If you fancy having a go getting them to pass, you're more than welcome. Otherwise I'll tackle this in the new year.

Paul.

paulcuth commented 9 years ago

After the festive break and with a mind as fresh as the year, I can immediately see what the removed code was trying to solve. I can also see that it would only work on the simplest of cases, but still needs to be addressed.

Take, for example, the following Lua pattern that matches underscores and letters:

[_%a]

Considering that %a itself is translated to a character class in JavaScript, it would result in the following:

[_[a-zA-Z]]

I'm not entirely sure what this would match (maybe an extra [ and/or only when followed by a ]) but it's certainly not what we want. Therefore, the code that you removed was there to remove the inner brackets, resulting in:

[_a-zA-Z]

Perfect. So that's why it's there.

Alas, it's not quite so perfect. It would completely fail on negated sets, like %D. For example, to match a zero or non-numerical character, [0%D] would translate to:

[0^\d]

The negation loses it's meaning when not at the beginning of the set.

What I'm essentially saying is that although translatePattern() fails in some cases with or without this PR, it is possible that this PR will break existing code that runs correctly on Moonshine.

Getting translatePattern() to work properly is pretty much a project in itself. With your knowledge of C, you may be better equipped than myself to port relevant code from C Lua and implement a better version. What are your thoughts on that?

paulcuth commented 9 years ago

I added some tests for the issues identified in my last post: https://github.com/gamesys/moonshine/commit/fc5655f46ffeea2912f55aa60ef1a70895b3a6f4

paulcuth commented 7 years ago

I'm going to close this PR as it's been inactive for 2.5 years. If anyone would like to contribute, please free free to do so here and I'll reopen.