Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Add `nomark` for striping accents, `samemark` is counterintuitive and slow for that purpose. #427

Closed bbkr closed 6 months ago

bbkr commented 6 months ago

Stripping accents to get base characters is very common operation in text indexing and searching. Raku allows to do it through:

"mówić".samemark( "a" )

There are two issues with that method:

My proposal is to add easy to use, optimized method for stripping accents:

say "mówić".nomark()   # mowic

One that will preferably avoid reallocating whole string splitting/joining if there are no marks to strip. Because usually this method will be called blindly on any given input - with or without marks. As Raku is more commonly used with LLMs this may be good addition to language.

coke commented 6 months ago

regexes have not only samemark but alsoignoremark

https://docs.raku.org/syntax/%3Aignoremark

In case we wanted to follow that naming convention. (I realize it's not a 100% match)

bbkr commented 6 months ago

@coke: I think this is early design inconsistency, probably dated back to Apocalypses.

That's why I'm not big fan of copying this mixed naming. I was also thinking about basemark name.

lizmat commented 6 months ago

https://github.com/rakudo/rakudo/pull/5562 contains an implementation.