Taitava / obsidian-shellcommands

Execute system commands via hotkeys or command palette in Obsidian (https://obsidian.md). Some automated events are also supported, and execution via URI links.
GNU General Public License v3.0
351 stars 11 forks source link

Bug: Variable escaping corrupts four-byte unicode characters, e.g. emojis. 🐓 #171

Closed Taitava closed 2 years ago

Taitava commented 2 years ago

Discussed in https://github.com/Taitava/obsidian-shellcommands/discussions/170#discussioncomment-2280710

I happened to find out something interesting in shell command preview:

kuva

The shell command does not do anything useful, it just echoes the file path plus the emoji. But the preview text is interesting: The emoji that I manually types into the end of the command, is displayed correctly in the gray preview text. Then again, the emoji that comes from the {{file_path:relative}} variable, is corrupted. I suspect the problem is somewhere in the variable parsing logic, something there does not support this kind of special characters.

Now that I've inspected more, I've found out this regex is not so kind to unicode characters that are encoded with more than two bytes: https://github.com/Taitava/obsidian-shellcommands/blob/9eca3355893d5ded67c79669f3e878b9d2495158/src/variables/escapers/AllSpecialCharactersEscaper.ts#L10 The regex splits e.g. 🐓 to two characters and escapes them with two backquotes ` (PowerShell) or two backslashes \ (Bash/Dash/Zsh). So, 🐓 becomes: `�`� (PowerShell) or \�\� (Bash/Dash/Zsh). The correct result would be: `🐓 (PowerShell) or \🐓 (Bash/Dash/Zsh).

The problem can be fixed by adding a unicode flag to the regex pattern:

- return this.raw_value.replace(/[^\w\d]/g, (special_character: string) => {  // /g means to replace all occurrences instead of just the first one.
+ return this.raw_value.replace(/[^\w\d]/gu, (special_character: string) => {  // /g means to replace all occurrences instead of just the first one. /u means to handle four-byte unicode characters correctly as one character, not as two separate characters.

This bug was born in version 0.7.0 when implementing #11 . So, unescaped variable values (the {{! exclamation mark variable }} syntax ) are not affected by this bug.


I'll add the unicode flag to all regex patterns in the whole plugin. I'll compile a list of all the changed regex patterns here.

Commit ffcedc069091effd5114ba7a43d92b31d089354b fixes the original bug.

Commit b496091c0ade6fe4e5ca14145ba2f93ed9b3e9a9 adds the /u modifier to the following other regex patterns:

Taitava commented 2 years ago

Should be fixed now, will be released later in 0.11.1.

Taitava commented 2 years ago

Released.

Taitava commented 2 years ago

I added still one fixing commit (db85689a7638117a2796c4af4a71c23cec8a5c64) related to this bug. However, I did not notice that there would be any problem without this particular fix, so it's only a "just in case" fix. It will be released together with #178. I think I won't add this to CHANGELOG.md as it's so small thing.

Edit: Wrong commit id.