TiddlyWiki / TiddlyWiki5

A self-contained JavaScript wiki for the browser, Node.js, AWS Lambda etc.
https://tiddlywiki.com/
Other
8.07k stars 1.19k forks source link

textlength filter #1493

Closed tobibeer closed 8 years ago

tobibeer commented 9 years ago

Related to this discussion, I propose introducing a textlength filter with the following semantics...

The above is applied to any number of input titles. min:n is especially important for search performance (should be at least 2), and also the add-tag-popup so as to list existing tiddler titles for tagging.

nicholas-spies commented 9 years ago

Would this not make more sense as (where text is understood to be the text itself, or an expression that evaluates to text, such as ("A man" , " " , "a plan", " ", :a canal: Panama") [if , concatenates text].

text.length — returns the length of the text text.length[3] — returns the text only if exactly of length 3 text.length[min:3] — returns the text only if at least 3 characters text.length[max:10] — returns the text only if at most 10 characters text.length[min:2,max:10) — returns the text only if at least 2 but no more than 10 characters

This addresses the problem I see in filter, they are essentially expressed backwards and ignore the well-established tradition of . (dot) notation to refer to methods and data inside an OO object.

Also, there is an opportunity here that I feel is missed: (I raise this in part because a man named Fred Hansen, who worked on the IBM Andrew Project at Carnegie Mellon University, devised a way of specifying what he called "slices" of text, in a manner similar to this. [I don't want to speak for him, but I thought it only right to mention him for his work in this area].

In the following, the work "string" could be substituted for "text".

If one where to consider the spaces between characters in text, starting with 0 at the beginning and having the value of n after the last character, one could say:

text.subtext[0:5] would unambiguously specify the sub-string of text from characters 1-5 inclusive.

text:subtext[5:+6] could specify that you want 6 characters, starting just after the 5th character

text:subtext[-4:(length)] could specify that you want only those characters but the last 4 from the end (the word length in this context meaning text.length) = n mentioned above. I think that the meaning is clear enough that you would not have to spell it out.

text.subtext[9:-5] would specify that you want the 5 characters that end with the 9th character of text.

I believe Python uses a similar syntax for specifying substrings. There is no reason that an arbitrary bunch of substrings could not be extracted in one operation by simply adding parameters separated by commas.

text.subtext[2:+3,length:-4, 6:8] that is if text were "abcdefghijk" the result would be "bcdhijkfgh"

Another extension might be to use character literals as parameters of the method subtext. Unmodified, they would apply to the first instance of the character in text, modified by a number, if we were to use a 1-based count for this purpose, "b";1 would be the 1st instance of "b" (the same as if "b" were used unmodified) from the start, while "b";-3 would be the 3rd instance of "b" from the end of the text. Notice I have chosen ; to indicate a modifier. This would imply that: text.subtext["a",+4) would be the four characters after the first instance of "a" in text whereas text.subtext["b";3,-5] would be the five characters from the 3rd instance of "b" from the end of text

We could split a string around a letter (or literal substing, say "and") with:

text.subtext[0,"abc"] would return the charaters to the first instance of "abc" while text.subtext["abc",length] would return the characters between the first instance of "abc" and the end of text.

Cases where the negation of the filter would be desired, what happens when the string is overrun by referring to beyond its end, etc. would have to be worked out.

All said, I feel that this, or a variation of it, would provide a very flexible means for extracting subtexts from text, and with a syntax that would start with the whole and specify the pieces desired by parameters.

nicholas-spies commented 9 years ago

A Slight Correction

text.subtext["b";3,-5] would be the five characters from the 3rd instance of "b" from the end of text should read text.subtext["b";3,-5] would be the five characters counting backwards from the 3rd instance of "b"

I would say that failures should just produce the empty string "", or, if they can resolved as partial answers, to return the portion of the string up the the end, or down to the beginning, even if more characters were originally specified. These issue deserve some deep thought, so as to minimize unintended consequences...

nicholas-spies commented 9 years ago

It also occurred to me that the syntax mentioned above could be used to extract a series of base pairs in DNA at specified locations rather easily. Also, using spaces " " as a literal, and an implicit looping convention, it might be easy to parse a sentence into a list of words, as well.

nicholas-spies commented 9 years ago

And, of complete irrelevance, did you notice that today, 15 Feb 2015, is a palindromic date? 15.02.2015

nicholas-spies commented 9 years ago

Sorry to fill up you issue with (incorrect) observations: I should qualify the above by saying, for those with dyslexic tendencies, it is palindromic, but we'll have to wait until 15.02.1051 to be truly so. My apologies. :-)

Jermolene commented 9 years ago

@tobibeer are there any uses you can think of for textlength:max:10[text] and textlength:3[text].

I wondered why you've got the text as the operand. Wouldn't it be more flexible to make this a filter that works on the accumulated result list?

tobibeer commented 9 years ago

@Jermolene,

Max and min could serve validation purposes. The text is the operand because the filter operates on it. I would have thought it easy to define it as a text-reference. In other words, I find it more natural to feed a filter a list of titles on which to operate, rather than other types of data.

@nicholas-spies,

I quite like the dot notation idea and think an object-property-notation for the filter operand (and the suffix) would be very useful and neatly extensible... could even be implemented in backwards-compatible ways: if we identify object-notation, parse that, otherwise simply assign the operand to the value property of an operand object.

Jermolene commented 9 years ago

@tobibeer I'm not convinced about the use cases for "max". Validation would ordinarily be better done by setting the length of the associated edit widget. So, I'd favour keeping things simple with a minlength operator, rather than textlength with a min or max suffix.

The text is the operand because the filter operates on it. I would have thought it easy to define it as a text-reference. In other words, I find it more natural to feed a filter a list of titles on which to operate, rather than other types of data.

I prefer filter operators to work on the input string list, using the operand as an optional parameter. You can still use a text reference: [{mytiddler!!field}minlength[3]].

The trouble with the way you've suggested is that it makes it much harder to apply the operator to anything that isn't a literal string, transclusion or variable - in particular, one can't easily apply your minlength to the output of a filter.

tobibeer commented 9 years ago

in particular, one can't easily apply your minlength to the output of a filter

ok, I understand the rationale

as for any such filter ...I'll leave it to you, whichever it's going to be

nicholas-spies commented 9 years ago

I may be missing the point entirely, but why worry about the min and max length of a string when the possibility exists for specifying any arbirtrary substring(s) using a simple, OO-type syntax? That is, there seems to be an implicit but unstated use for what @toibeer suggests, which, IMHO, would be useful to spell out, for just knowing whether a string is larger or smaller in length than a given amount is obviously useful for something, it is no substitute for being able to slice and dice strings any way you like.

The text—any text (say T)— should have the inherited ability to know its own length (or len), which would be produced by T.len (or "abcdefg".len). By the same token, substring (or substr) would give, by means of T.substr(3:len) the substring of T from char 3 to the end of T). Negative indices would count backwards, from the end or an index or a string literal. String literals ("a" or "abc") could be used as markers within T from which to extract substrings (e.g. T.substr("abc":len) would produce substring between the first instance of "abc" in T, to the end of the text. [Whether these are inclusive or exclusive of the positions and literals used as delimiters would be a matter of taste and consistent results.]

Now, this may be impossible or undesirable in the core, as stated in Hangout #80, but it makes this approach no less attractive for a plugin, which, if I know JavaScript at all, I would be happy to write.

During my HyperCard programming days, as fellow named Dick Pountaine, from the University of Hull, I believe, wrote about two incredibly simple routines called Before() and After() in Byte magazine. I implemented them in C as HyperCard extensions, and got of mileage from them. Each would took the full string (use T="abcdefg" here) and the character or string literal within T such that Betore(T,"f") would produce "abcde" while After(T, "c") would produced "defg". Stupidly simple, but actually quite useful. Certainly much easier to using than a regex (which, of course, if implemented nicely, can do just about anything you want with text, but are tricky enough to have warranted at least one an entire O'Reilly book on their use).

--Nick

On Wed, Feb 18, 2015 at 12:55 PM, Tobias Beer notifications@github.com wrote:

in particular, one can't easily apply your minlength to the output of a filter

ok, I understand the rationale

as for any such filter ...I'll leave it to you, whichever it's going to be

— Reply to this email directly or view it on GitHub https://github.com/Jermolene/TiddlyWiki5/issues/1493#issuecomment-74912614 .

Jermolene commented 9 years ago

Hi @nicholas-spies

I may be missing the point entirely, but why worry about the min and max length of a string when the possibility exists for specifying any arbirtrary substring(s) using a simple, OO-type syntax?

The driver for this ticket is wanting to be able to suppress search results until a minimum number of characters have been typed in the text box. I think that the ability to specify any arbitrary substring using an OO syntax is an orthogonal objective.

The examples you go on to give are incompatible with the established filter syntax of TiddlyWiki5.

As I said in the hangout, I'm sympathetic to experimenting with alternative filter languages, but I think we need to do it by first engineering a way to specify that a filter is expressed in a different language. Figuring out that mechanism is a necessary first step to implementing a plugin that implements such an alternative language.

Dick Pountaine, from the University of Hull, I believe, wrote about two incredibly simple routines called Before() and After() in Byte magazine.

I remember it too. Dick is very much still around, by the way, I follow him on Twitter.

The current TW prerelease features a proposed new filter operator "splitbefore" that implements similar logic:

http://tiddlywiki.com/prerelease/index.html#splitbefore%20Operator

It is used to construct the experimental system tiddler explorer:

http://tiddlywiki.com/prerelease/index.html#%24%3A%2Fcore%2Fui%2FMoreSideBar%2FExplore

nicholas-spies commented 9 years ago

Hi Jeremy!

Please give Dick my regards; I met him at the Rochester Forth Conference that used to be held at the University of Rochester, NY, around the time I co-wrote "Forth: A Text and Reference". You might mention to him that my co-author, Mahlon Kelly, who was also at Rochester, passed away more than a decade ago.

Thanks for explaining why it was important to know the min/max characters—I simply missed the first part of the discussion. Now knowing the reason, I can see that the generalized parsing of strings I was proposing would be absolutely beside the point. I expressed these thoughts in a non-compatible format from TiddlyWiki because I simply don't understand the rationale nor formalisms of TW filters, which, from my point of view is a failure of the documentation that I had been exposed to.

I thought that by reformatting (and in some cases, rewriting) the documentation that I would be helping others, too. However I have not been able to post what I have done onto TiddlySpot, as you suggested, and in addition do not know where the documentation exists for all the various other additions that are continually being added or whether I have been working with the latest version of it or whether it is being maintained in a central location at all. It seems that some people felt that my putting a lot of whitespace into the documentation was somehow patronizing to them ("unprofessional"), which I think is a rather odd response, because in order to recruit new users everything should be stated in as clear a manner as possible. Perhaps it is unconventional, but many aspects of TW are unconventional. Outlines are how people take notes at lectures, or at least used to, and similar liberties are taken with the formatting of poetry to make its internal form more easily appreciated. Because of not (as of this writing) having been able to post it, it hasn't been given a fair airing. But its usefulness is up to others to decide.

On a more personal note, methinks I am trying your patience a bit, but this is the last thing I want to do. I guess I just don't understand the constraints that I have to put on my thoughts, or what is useful to share and what I should keep to myself, in order not to become a source of annoyance to you and others on the project. Although it may be crystal clear to you as to where the bounds of the possible stop and useless babble begins, it is quite a bit less clear to me, and this is entirely because I am no where near up to speed on what the current objectives are, what is helpful and what is simply foolish to suggest. I seem to be doing too much of the latter. Perhaps this is not surprising because although I have written programs for hire, my main career was video editing, whose aesthetic decision-making has precious little to do with programming per se. Paradoxically, I probably have had a greater exposure to a variety of programming languages, (from J to Squeak, MOZart to Python, Forth to OCAML, Prograph to assembler) than most of your developers.

Perhaps my difficulty reflects the split audience that TiddlyWiki serves (something also mentioned in Hangout 80), namely, those with the skill set to customize TW and those, such as myself, who are only users, with little ability to take advantage of what should be freely available to them. These camps are IMO somewhat at odds (though I don't want to emphasize this too much; just about everyone I have had contact with is helpful to a fault, and humble, too). This is why I feel the documentation should serve both groups rather than emphasizing the differences between the users and developers by having two sets of documentation. This would help so-called non-programmers to take advantage of the adaptability of TW while keeping developers focussed on the needs of non-programmers, a potentially far greater number of users. Ideally, TW should be practically self-evident to use, as HyperCard was: I think we would agree on this point.

You have made a wonderful and useful tool and I don't want to impede your progress.

Any thoughts on these matters?

Best,

Nick

On Wed, Feb 18, 2015 at 2:43 PM, Jeremy Ruston notifications@github.com wrote:

Hi @nicholas-spies https://github.com/nicholas-spies

I may be missing the point entirely, but why worry about the min and max length of a string when the possibility exists for specifying any arbirtrary substring(s) using a simple, OO-type syntax?

The driver for this ticket is wanting to be able to suppress search results until a minimum number of characters have been typed in the text box. I think that the ability to specify any arbitrary substring using an OO syntax is an orthogonal objective.

The examples you go on to give are incompatible with the established filter syntax of TiddlyWiki5.

As I said in the hangout, I'm sympathetic to experimenting with alternative filter languages, but I think we need to do it by first engineering a way to specify that a filter is expressed in a different language. Figuring out that mechanism is a necessary first step to implementing a plugin that implements such an alternative language.

Dick Pountaine, from the University of Hull, I believe, wrote about two incredibly simple routines called Before() and After() in Byte magazine.

I remember it too. Dick is very much still around, by the way, I follow him on Twitter.

The current TW prerelease features a proposed new filter operator "splitbefore" that implements similar logic:

http://tiddlywiki.com/prerelease/index.html#splitbefore%20Operator

It is used to construct the experimental system tiddler explorer:

http://tiddlywiki.com/prerelease/index.html#%24%3A%2Fcore%2Fui%2FMoreSideBar%2FExplore

— Reply to this email directly or view it on GitHub https://github.com/Jermolene/TiddlyWiki5/issues/1493#issuecomment-74933404 .

tobibeer commented 8 years ago

@Jermolene, please consider implementing / merging a full fledged length filter as in the OP:

Demo

http://2603-len.tiddlyspot.com

Jermolene commented 8 years ago

Hi @tobibeer

In your example, is there a difference between [!len:min[4]] and [len:max[4]]?

I don't see the advantage of the suffix approach, nor the support for negation.

More problematically, the implementation isn't great: filters should avoid as much work as possible within the main iterator, and push all the static conditionals up out of the function.

Anyhow, as I said elsewhere, I'd prefer a simple "maxlength" operator.

tobibeer commented 8 years ago

Anyhow, as I said elsewhere, I'd prefer a simple "maxlength" operator.

And another length operator? Mhhh.

Jermolene commented 8 years ago

And another length operator? Mhhh.

It's about consistency with the policies that are already used in the core; look at addprefix, addsuffix, etc.

tobibeer commented 8 years ago

@Jermolene Ok, a plugin it is. ;-)

In your example, is there a difference between [!len:min[4]] and [len:max[4]]?

It is especially important for !len[4] which thus begs for negation for all variants. Besides, negation adds expressivity, i.e. to have !minlength mean maxlength simply is bad practice.

I don't see the advantage of the suffix approach, nor the support for negation.

I find it improves filters by being more concise as well as clearly having filter functionality about the same thing being in one.

More problematically, the implementation isn't great: filters should avoid as much work as possible within the main iterator, and push all the static conditionals up out of the function.

I'd be willing to improve that.

It's about consistency with the policies that are already used in the core; look at addprefix, addsuffix, etc.

I don't quite see that as a policy or convention but simply implementation preference. Yes, an efficient iterator is worthwhile. Verbosity in terms of the sheer number of operators, I find, not so much.

length[x]
minlength[x]
maxlength[x]

vs.

length[x]
length:min[x]
length:max[x]

To me, it's these kinds of modifiers suffixes are for. And, in this case, min and max work quite intuitive.

But then, again, in the following case, your approach leaves more room for extensibility:

maxlength:field-name[length]

rather than:

length:max field=field-name[length]

...so as to check for the length of tje value of a given field of each input title, while keeping the title in the filter chain.