Mudlet / Mudlet

⚔️ A cross-platform, open source, and super fast MUD client with scripting in Lua
https://mudlet.org
GNU General Public License v2.0
739 stars 267 forks source link

[BUG] replaceAll - unexpected behavior #4311

Open DocNessuno opened 4 years ago

DocNessuno commented 4 years ago

Summary:

replaceAll misbehaves (fails to replace the string without any error or replaces the string in odd/unexpected ways) when the string involves pipes and asterisks


Steps to reproduce:

Perl trigger: \|(.+)\| Trigger body: replaceAll(matches[1],"WORKS")

Test output:

  1. You say '|Test|'
  2. You say '|:|'
  3. You say '|*|'
  4. You say '|Test:|'
  5. You say '|Test*|'
  6. You say '|:*|'
  7. You say '|Test:*|'

Expected behavior:

You say 'WORKS'


Actual behavior:

  1. You say 'WORKS'
  2. You say 'WORKS'
  3. You say 'WORKS*WORKS'
  4. You say 'WORKS'
  5. You say '|Test*|'
  6. You say '|:*|'
  7. You say '|Test:*|'

Extra information: Mudlet 4.10.0 - Windows 10 using selectString() and replace() does not incur in the same issues using replaceAll(matches[2],"WORKS") sorta functions as expected but not really (for example replaces '|Test:|'` with '|WORKS|'- note the spurious the issue present itself even when the pipes are not the delimiting elements, such as in the following Perl regex(?:<!ENTITY Gruppo "tank:.+?|mob:.+?)?|(\w+?)when the output is... tank:|mob:*|Ashur ...`

Delwing commented 3 years ago

Hard to judge whether this is a bug or not.

Current implementation of replaceAll allows usage of Lua patterns for word parameter. So if you match the word with Lua expression special chars like * it will act according to that special char logic. It really doesn't matter that this comes from string matched by trigger.

It would be possible to code escaping of those characters in replaceAll, but in that case we lose control - we just force users to use plain strings and not leverage Lua patterns if needed.

For details how to escape string for find or gsub you can go to https://stackoverflow.com/questions/9790688/escaping-strings-for-gsub

And as for the issue itself I see two possible outcomes:

  1. we update docs, to state that word can use Lua pattern matching
  2. we escape all special chars, always treating word as plain substring
vadi2 commented 3 years ago

Let's go with (1), better functionality in the end there. I also thought we had a function to escape text for Lua patterns, but I can't find it, hmm.

Delwing commented 3 years ago

Yep, you can always escape, but cannot unescape :D

DocNessuno commented 3 years ago

Current implementation of replaceAll allows usage of Lua patterns for word parameter. So if you match the word with Lua expression special chars like * it will act according to that special char logic. It really doesn't matter that this comes from string matched by trigger.

Understandable, but the behavior seems even different from LUA (particularly, I struggle to follow the logic of example number 3).

Let's go with (1), better functionality in the end there. I also thought we had a function to escape text for Lua patterns, but I can't find it, hmm.

I would suggest at least having an escape function available so something like replaceAll(luaEscape(matches[1]),"WORKS") could be used, while I understand your concern, I believe that the cases were users want an exact match for replace or replaceAll severely outnumber the ones where they are interested in using LUA special characters.

Delwing commented 3 years ago

https://gitspartv.github.io/lua-patterns/?pattern=%7C*%7C

So first pipe can show up 0 or more times. Second pipe must always be present.

Possible matches for the pattern:

| -> |* this allows 0 matches || -> |* 1 match ||| etc.

Therefore in example 3 - both pipes are matched and replaced.

I have no idea whether this makes clear explanation.

DocNessuno commented 3 years ago

https://gitspartv.github.io/lua-patterns/?pattern=%7C*%7C

So first pipe can show up 0 or more times. Second pipe must always be present.

Possible matches for the pattern:

| -> |* this allows 0 matches || -> |* 1 match ||| etc.

Therefore in example 3 - both pipes are matched and replaced.

I have no idea whether this makes clear explanation.

I see your point, but consider that the pipes are not sent to the replaceAll command at all (to my understanding at least)

Example 3 should be equivalent to replaceAll("*","WORKS") on You say '|*|'

Delwing commented 3 years ago

What do you mean they're not sent? They are part of the pattern.

Example 3 is not, nor should be equivalent to replaceAll("*","WORKS") Single * is * literally, while |* is match 0+ |