[request] slugify operator

twMat commented 6 years ago

Similar to the encodeuri operator it would be useful with something that converts a string into a field name friendly format - and - a css identifier friendly format. I.e:

this-format : This is a field

.and-this-format { content:"This is a style definition" }

I believe the character rules are very similar for these (very different) use areas so IMO it would be enough with a single general solution to cover both.

Why? Well, it would increase hackability; Currently, creating fields or css defs is a very hands-on matter. But, more importantly, it would be possible to integrate the creation of fields etc much more in the workflow, with field names potentially automatically generated from tiddler titles.

I understand if "you" have not come across any situations where you needed an "operator to create a field name" but ever since I realized that this should be possible I find more and more situations where it would really simplify things. Incidentally, there is over all surprisingly little discussion about custom fields and even CSS so I suspect both of these areas are way under-exploited in TW.

AnthonyMuscio commented 6 years ago

twMat,

I support this idea because I have really started using custom fields of late, I would love to programaticaly add fields, with for example a number prefix or suffix. In mymenus plugin I could not easily automate the creation of menu-item-1 menu-item-2 .... fields as a new menu Item was added. I have also used menu-item-name-from-somewhere and removed the prefix "menu-item-" to get "name-from-somewhere" as the label in front of the fields value (transcluded), It would be nice if we can change that to "name from somewhere" or even "Name from somewhere". users could provide fieldnames from an input text field that are correctly encoded to save the value in.

Using prefix and suffix then add or remove prefix or suffix against fieldnames you can use the existence of a fieldname as a value in itself with the bonus the field also has a value that can be stored.

I do think this could help hackability.

Regards Tony

Jermolene commented 6 years ago

@twMat are you proposing a new filter operator that converts an arbitrary string to a string of lower case letters, digits and dashes?

That operation is sometimes called to slugify a string. For example: https://blog.tersmitten.nl/slugify/

Good slugify routines also transliterate accented characters into their closest ASCII equivalent (eg é to e or Œ to OE).

One point to note is that most approaches do not guarantee uniqueness, with multiple strings mapping to the same slug (eg, HélloThere and HelloThere both mapping to hellothere).

@AnthonyMuscio I think you're talking about something different: perhaps a macro to return a field name given a prefix and a list of the fields already present on a tiddler?

twMat commented 6 years ago

@Jermolene , yes, that's exactly what I'm asking for. slugify[] - LOL!

One point to note is that most approaches do not guarantee uniqueness, with multiple strings mapping to the same slug (eg, HélloThere and HelloThere both mapping to hellothere).

Yes. I assume it's the same "issue" with the encodeuri filter op and possibly also encodehtml and encodeuricomponent - but anyone using these is probably knowledgable enough to be aware of this.

Side note: Reading up a little on "slug", I'm thinking it might even be useful for creating urls. Not only to make them pretty but apparently also for SEO.

twMat commented 6 years ago

addendum

That last link also has a slugify library section containing this code:

var slugifyText = function (text) {
    return text.toString().toLowerCase()
        .replace(/\s+/g, '-')
        .replace(/[^\w\-]+/g, '')
        .replace(/\-\-+/g, '-')
        .replace(/^-+/, '')
        .replace(/-+$/, '');
}

I'm not sure what is supposed to happen with characters that are not listed there tho.

Jermolene commented 6 years ago

Yes. I assume it's the same "issue" with the encodeuri filter op and possibly also encodehtml and encodeuricomponent - but anyone using these is probably knowledgable enough to be aware of this.

Both encodehtml and encodeuri do have the property of having unique mappings that do not clash for every input string, making them reversible.

Side note: Reading up a little on "slug", I'm thinking it might even be useful for creating urls. Not only to make them pretty but apparently also for SEO.

Indeed, xememex.com already uses a slugify function for URLs. If a tiddler has a "slug" field then that value is used as the slug, otherwise it transliterates special characters, falling back to using numeric character codes:

exports.slugify = function(wiki,title) {
    var tiddler = wiki.getTiddler(title),
        slug;
    if(tiddler && tiddler.fields.slug) {
        slug = tiddler.fields.slug;
    } else {
        slug = $tw.utils.transliterate(title.toString().toLowerCase()) // Replace diacritics with basic lowercase ASCII
            .replace(/\s+/g,"-")                                       // Replace spaces with -
            .replace(/[^\w\-\.]+/g,"")                                 // Remove all non-word chars except dash and dot
            .replace(/\-\-+/g,"-")                                     // Replace multiple - with single -
            .replace(/^-+/,"")                                         // Trim - from start of text
            .replace(/-+$/,"");                                        // Trim - from end of text
    }
    // If the resulting slug is blank (eg because the title is just punctuation characters)
    if(!slug) {
        // ...then just use the character codes of the title
        slug = title.split("").map(function(char) {
            return char.charCodeAt(0).toString();
        }).join("-");
    }
    return slug;
};

I've been meaning to move this over to the core -- the main discouragement is writing all the documentation :(

twMat commented 6 years ago

I've been meaning to move this over to the core -- the main discouragement is writing all the documentation :(

That's good news, in some sense ;-) I may be naive but documenting it seems very to-the-point (hah, a slug!) Is there anything more to mention beyond what it's for and list what is converted into what? So, how about you add the code and I do the doc or at least a draft for it?

Jermolene commented 6 years ago

A proper implementation would extend to using the slugify operator to generate URLs for rendered tiddlers (whether static or over HTTP), and that does take some documentation because of the pitfalls. Users need to understand how to detect the situation of multiple titles clashing to the same slug, and how to recover from it.

AnthonyMuscio commented 6 years ago

I thought mats suggestion was primarily about forming fieldnames

Set a string to lowercase and replace spaces with - may be enought. While only accepting a-z 0-9

Regards Tony

Jermolene commented 6 years ago

Indeed, but the answer to Mat's question is to add a core slugify operator because the requirements are the same, and if we do that we have to support using it for its most useful application. The enhancements in the referenced code make sense for field names just as much as tiddler titles.

AnthonyMuscio commented 6 years ago

I thought mats suggestion was primarily about forming fieldnames

Set a string to lowercase and replace spaces with - may be enought. While only accepting a-z 0-9

Regards Tony

twMat commented 6 years ago

(@AnthonyMuscio - double post)

In deed my request was for field/css-names but @Jermolene I see your point.

What's the policy about adding code; must documentation come at the same time? If people are to contribute then I can see a point to add the code independently and (1) give a chance for others to contribute docs even if I understand the chances of that are slim, but also (2) early adopters will expose pitfalls that are valuable when authoring docs.

joshuafontany commented 6 years ago

This would be very handy processing JSON key names into tiddler field names (or checking for legality and giving the option to the user). I'm working on the next version of my JSON plugin (full wiki syntax support for nested json objects, path/pointer references, null/boolean values, etc). Almost have a release candidate....

Documentation is still almost non-existent, hrm, gotta work on that. My backwards compatible approach means that "foo" and "/foo" refer to the same namespace, so that paths can be written as "foo/0/bar" or "/foo/0/bar". And if that namespace exists as both a nested object and a top-level path-named key-value pair in the same object, I had to decide on priorities for read/write operations, and how to "deduplicate" them. Gotta get that all documented, :P.

I'm also have a "Tokenized Sort" filter on my ToDo list. I've already got that code worked out for sorting JSON tiddlers by key-name, so I'd just have to re-impliment it into a Filter. Seeing the slug[] filter built would help.

Mahalo, Joshua Fontany

On Wed, Aug 8, 2018 at 7:10 AM twMat notifications@github.com wrote:

(@AnthonyMuscio https://github.com/AnthonyMuscio - double post)

In deed my request was for field/css-names but @Jermolene https://github.com/Jermolene I see your point.

What's the policy about adding code; must documentation come at the same time? If people are to contribute then I can see a point to add the code independently and (1) give a chance for others to contribute docs even if I understand the chances of that are slim, but also (2) early adopters will expose pitfalls that are valuable when authoring docs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Jermolene/TiddlyWiki5/issues/3379#issuecomment-411419338, or mute the thread https://github.com/notifications/unsubscribe-auth/AIrmB5YXEnMDOe6s2YpwlU_Do5V6P6Edks5uOvFUgaJpZM4VxUbU .

pmario commented 6 years ago

Set a string to lowercase and replace spaces with - may be enought. While only accepting a-z 0-9

What about öä and other stuff !!!

I'm pretty sure CSS will have problems with special chars. ... That's why the we needed: "Custom styles by data-tags"

twMat commented 5 years ago

I've been meaning to move this over to the core -- the main discouragement is writing all the documentation :(

A friendly bump to ask if this is still the intention, now that it's a field operator party ;-) This one (and #3803 ) would allow for very cool stuff.

Jermolene commented 5 years ago

Hi @twMat no, the idea now is to avoid big changes until after v5.1.20.

twMat commented 5 years ago

OK, thanks. I didn't see it as a big change, albeit an important one ;-)

twMat commented 5 years ago

An insight, just for the record:

It seems to me that the number produced by the qualify widget could be replaced with a slug of the current tiddlers title, making for an easier to read (and understand) qualified string.

twMat commented 4 years ago

If what is hindering this is that it is just too difficult to know how to handle conflicting conversions, then maybe it is possible with an implementation where the user is expected to create his own custom dictionary for the conversion? I.e let the user decide what becomes of é so it doesn't conflict with e?

Jermolene commented 4 years ago

Hi @twMat the transliteration mapping table is huge and I don't think it makes much sense to expose it to users. The scheme I mentioned above allows users to override an automatically generated slug with a "slug" field. The Xememex implementation also includes an [all[duplicateslugs]] operator, making it possible to warn the user interactively if there are duplicate slugs.

twMat commented 4 years ago

The scheme I mentioned above allows users to override an automatically generated slug with a "slug" field. The Xememex implementation also includes an [all[duplicateslugs]] operator, making it possible to warn the user interactively if there are duplicate slugs.

An idea might be to reuse the mechanism that creates unique titles ("New Tiddler", "New Tiddler 1", ...) to ensure slugs are always unique by default.

Jermolene commented 4 years ago

Hi @twMat indeed, that works if one is assigning slugs to tiddlers and storing the as a field on the tiddler. But part of the point is to be able to generate slugs dynamically, without necessary having access to previous generated slugs.

joshuafontany commented 4 years ago

Hi guys,

I'm just wrapping up a sophisticated CSV "import" process, with a new $csv widget that renders type:application/csv tiddlers with UI (peek, pages, toggle headers, import-to-json-or-tidders, etc).

A slugify[] method/filter would be so useful that I'm going to just implement a basic $tw.utils.slugify() method from the above code for my import process. I would love to see this in the core with the full transliteration mapping table in the near future. :)

Best, Joshua F

twMat commented 4 years ago

@joshuafontany - did you implement this? If the mapping is externalized into a separate dictionary tiddler, then the user can himself decide what is a sufficient degree of transliteration. I would love to be able to just do slugify<string> and have it come out converted... and for any "substring" that didn't come out right, I'd just add an entry to the dictionary. It would be my responsibility to ensure the output fulfills the contextual criteria.

Actually, not sure slugify is the right name then since the scope is broader than to create, specifically, slugs.

Jermolene commented 4 years ago

@twMat I've already coded this for use in xememex.com static site generation. I'll move it into the core when I can -- feel free to remind me.

twMat commented 4 years ago

Thanks @Jermolene , I recall you said the documentation was the problem. Is that possible for you or anyone else to work on?

Jermolene commented 4 years ago

Hi @twMat the docs (and tests) are always the time consuming part. For example, the new subsort operator that I worked on this morning took about 20 minutes to write the actual code, but 2 hours to get it ready for merging. (In fact, in that case I did it the right way round by writing the docs and the tests before the code, but the point remains).

Jermolene commented 4 years ago

But to answer your question about somebody else writing the docs, I think it would be quicker for me to write the docs than prepare a PR with enough docs for someone else to write the docs, if you see what I mean.

twMat commented 4 years ago

Something doesn't seem right if a requested solution already exists but can't be published because of lacking docs. Maybe things could be unofficially published and then the urgency for producing docs is indicated by the number of people requesting help, or just the number of requests for help, or what specific people or user categories that make the requests. I imagine this workflow even helps for the actual doc formulation by highlighting what aspect people get stuck on and what use cases they have that might be suitable for doc examples.

But I'll follow up on your invitation to...

I'll move it into the core when I can -- feel free to remind me.

Thank you :-)

Jermolene commented 4 years ago

Something doesn't seem right if a requested solution already exists but can't be published because of lacking docs

It's quite common! Writing docs is hard, and yet it is an integral part of adding a new feature.

Maybe things could be unofficially published

That's what already happened further up this thread...

twMat commented 4 years ago

Maybe things could be unofficially published

That's what already happened further up this thread...

Ah, you're right! Thanks. I'll remind about docs in a while then ;-)

pmario commented 4 years ago

(In fact, in that case I did it the right way round by writing the docs and the tests before the code, but the point remains).

@twMat ... That's the right way to do it. ... The initial docs and tests have to be "right", otherwise "others" don't have the chance to "improve" the documentation. ... Developer docs have the tendency to be too "sparse" for newbie users and still need a lot of time to create them. ... Because they have to be "right"

twMat commented 4 years ago

Yes, thanks @pmario :-)

joshuafontany commented 4 years ago

@joshuafontany - did you implement this? If the mapping is externalized into a separate dictionary tiddler, then the user can himself decide what is a sufficient degree of transliteration. I would love to be able to just do slugify<string> and have it come out converted... and for any "substring" that didn't come out right, I'd just add an entry to the dictionary. It would be my responsibility to ensure the output fulfills the contextual criteria.

Actually, not sure slugify is the right name then since the scope is broader than to create, specifically, slugs.

I love the idea of setting the mapping up as a dictionary! I did give this a shot based on the code shared upthread. Here is the JS file: https://github.com/joshuafontany/TW5-JsonMangler/blob/master/modules/filters/slugify.js And the tiddlywiki *js.meta file: https://github.com/joshuafontany/TW5-JsonMangler/blob/master/modules/filters/slugify.js.meta

Best, Joshua F

TiddlyWiki / TiddlyWiki5

[request] slugify operator #3379