TiddlyWiki / TiddlyWiki5

A self-contained JavaScript wiki for the browser, Node.js, AWS Lambda etc.
https://tiddlywiki.com/
Other
8.05k stars 1.19k forks source link

[IDEA] String manipulation operators should be able to deal with \n\t .. and others #4697

Open pmario opened 4 years ago

pmario commented 4 years ago

also see GG tread

splitregexp allows us to detect "new-lines" and split.

join should have a similar possibility. eg: join:special[\n] would be 1 possibility. ... Number of operands should be limited to \n, \t and may be html entities eg:   but I'm not sure here since   already causes a lot of problems in the TW UI.

I'm not sure if joinregexp as a operator name makes sense. ... For me it doesn't

Other candidates for a :special variant are addprefix, addsuffix

Jermolene commented 4 years ago

I wonder if we might resolve this more generically by introducing a new type of quote symbol that marks escapable strings using JavaScript string literal syntax. Backticks would join <>, {}, [] as ways of delimiting the operand of any filter operator. For example:

[join`\n`]
AnthonyMuscio commented 4 years ago

Jeremy,

It is a very good idea to leverage the "JavaScript escaped literal syntax." standards.

It is what I hinted at in the original thread. This allows us to expose the underlying functionality and standards of javascript to the widgets(?) and operators. No reinventing the wheel, but expanding capabilities. I expect this and a few more exposing of underlying methods would substantially reduce the need for javascript plugins, making it more accessible to users/superusers.

Theoretically it will permit the inclusion of all these

and more arising there.

My question is however if we can't use these escape sequences as a rule eg;

split[\n] join[\n] match[\n] then[\n] else[\n]

Perhaps for backward compatibility we need a paramerter like

else:esc[\n!newHead]

where esc says allow javacript escaping with plain text

AnthonyMuscio commented 4 years ago

Post script,

I just wanted to point out something about my involvement with tiddlywiki. I am intentionally remaining in the superuser category of users, whilst I plan later to write javascript this places me in a forced constraint environment. This constraint lets me see tiddlywiki without reverting to custom coding/plugins. As a result, like this particular case, I tend to argue for facilities delivered by wikitext, widgets and macros to enhance functionality and hack-ability for "non-coders".

I hope this serves a function within our community, and request your forbearance in the light of this.

Also welcome back Jeremy, Get healthy, I did not respond to the "Personal New" Topic to avoid overwhelming you, but I share the heartfelt gratitude you are recovering.

Yours Sincerely TonyM

AnthonyMuscio commented 4 years ago

Here is another example, Assuming we can include escaped content, and highlight the increased use of the power of filters with the triple braces.

{{{ [all[current]then:esc[this\n]] }}}

This could be used in reports, tables etc... to add anything possible with the escape sequence.

It also highlights how a shorter form of all[current] would be very helpful,

eg with a symbol of your choice {{{ [#then[this]] }}} where hash is replaced with [all[current]]

It would make a lot of macros much more readable and easy to check you are not about to list "[all[]" in the wiki, the default behaviour. This is a kind of filter pre-processor.

Regards Tony

pmario commented 4 years ago

@Jermolene How would join:special[asdf\n\tsome text] look like using your syntax?

or

\define specialtext()
xxx\n\n\tsome text
line above contains an invisible new-line
\end

{{{ [[aaa bbb]join:special<specialtext>]}}}
Jermolene commented 4 years ago

Hi @pmario

Which part of my proposal was confusing? The first example would be:

join:special`asdf\n\tsome text`

This proposal does not address the second example. That's something else, and would clearly require an alternative syntax for macro definitions.

@AnthonyMuscio the trouble with the idea of using a suffix is that each and every filter operator has it's own implementation of suffixes, because different operators use suffixes differently. So, applying a universal change like this would require altering all of the existing filter operators (and updating the docs), whereas my proposal requires a single change in a single place, that can also be documented in a single place.

AnthonyMuscio commented 4 years ago

Jeremy

I see why a different set of "braces' would make sense in many ways. If I understand it allows you to respond to the escape strings before they are passed to the existing operators? So the operators can continue to work as is? As you say one change for all operators (as long as the can cope with the special chatacters).

the quotes you gave in the example however not a good example. There seems too little difference in left and right braces and on qwerty I think they are in different places. But beggers can't be choosers.

It is visually hard to scan a filter with such a different representation.

Is it not practical to provide a method to insert these via a variable or transclusion, even a set of global values macros?

It is already necessary to do this with regex strings so it would follow an existing pattern as well

Eg; \define escseq() .\n Or what ever it takes \define escseq() '.\n'

And use it. As in the filter.

unless there are other braces available in filters?

no need to explain if these ideas are impractical but if they are please consider them.

This would be a good feature.

rmunn commented 4 years ago

I like the idea of backticks for a new sort of "brace" that interprets Javascript escape sequences. There are a couple of "gotchas" that I worry about:

  1. Please forbid octal escapes (e.g., \101 for uppercase A). TiddlyWiki should be aimed at users who are non-programmers, and octal is just plain confusing to everybody, including programmers. The only place where octal is at all useful is in Unix file permissions, and everywhere else it needs to die and be buried like the mistake it was. And since they're deprecated in Javascript, we shouldn't allow them.
  2. Following up on point 1, please DON'T make the character sequence \101 an error, but instead treat it as four characters: backslash followed by three digits. Making \101 an error would just cause confusion among non-programmers and frustration among programmers.
  3. I worry about surrogate pairs. Surrogate pairs are even worse than octal at confusing everyone, programmers and non-programmers alike. It might be nice to disallow the surrogate pair range entirely from Unicode escapes, so that \uXXXX allows the four hex digits XXXX to be in the range 0000-D7FF or E000-FFFF. The error message on encountering a Unicode escape in the D800-DFFF range could be customized to say "Please don't use surrogate pairs: replace \uD83D\uDE00 with \u{1F600} which represents the same character". (With, of course, the actual surrogate pair that the user used in the string). Also, invalid surrogate characters that aren't part of a pair would need their own error message that doesn't suggest the Unicode escape.

Point 3 is a little bit of a departure from the ECMAScript standard so it might be controversial, but I like the idea of TiddlyWiki's Javascript-like escapes being a sane subset of what ECMAScript allows. (And surrogate pairs, while an unfortunate technical necessity as an internal representation if your language was designed before UTF-8 took over the world, are no part of a "sane" UI that should be exposed to users.)

Alternately, you could just say that Unicode escapes in TiddlyWiki will always represent codepoints, not code units, and so \u will be followed by 1-6 hex digits representing the Unicode codepoint. (And the range D800-DFFF would be forbidden as those are invalid Unicode codepoints).