SonOfLilit / kleenexp

modern regular expression syntax everywhere with a painless upgrade path
MIT License
73 stars 16 forks source link

Unicode matching #30

Open mikaelho opened 2 years ago

mikaelho commented 2 years ago

Hello, and thanks for a very interesting package!

I would suggest that the #letter macro be tweaked so that it matches modern Python behaviour, where \w in a unicode pattern string matches a unicode letter like ä, whereas the current implementation #letter does not.

SonOfLilit commented 2 years ago

I agree with this suggestion and intend to do this in the future, it's a nontrivial task and I want to launch first.

mikaelho commented 2 years ago

Makes sense.

Your use of the word ”launch” and the stated goals of displacing re made me want to ask if you saw this list, also known as ”the competition”?

SonOfLilit commented 2 years ago

I saw it now, very well researched.

I noticed that in the "trying to use" section, some packages try to parse "hello (a-123)", some try to parse "hello [a-123]", and kleenexp tries to parse "hello [a-123)". How about using the expression:

[[capture:title 1+ #any] ' ' #tag=[[capture:key #letters] '-' [capture:id #digits]] ['(' #tag ')' | '[' #tag ']']]

To only parse those you actually want?

I maintain a list, in the readme (https://github.com/SonOfLilit/kleenexp#similar-works), of packages I consider competition. The serious ones that I hope might actually win are melody and pomsky. You'll notice all of your list is lumped under "There are many more eDSLs, but I will not list them as they are less relevant in my opinion"

I honestly just don't believe you can solve the problem I'm trying to solve for just a single language, you have to solve it for editor find&replace, for log monitoring system configuration, for database queries, etc'.

What would need to be done before kleenexp is the sole winner of your research? Wondering if my dev roadmap lines up with your needs :)

SonOfLilit commented 2 years ago

I saw it now, very well researched.

I noticed that in the "trying to use" section, some packages try to parse "hello (a-123)", some try to parse "hello [a-123]", and kleenexp tries to parse "hello [a-123)". How about using the expression:

[[capture:title 1+ #any] ' ' #tag=[[capture:key #letters] '-' [capture:id
#digits]] ['(' #tag ')' | '[' #tag ']']]

To only parse those you actually want?

I maintain a list, in the readme ( https://github.com/SonOfLilit/kleenexp#similar-works), of packages I consider competition. The serious ones that I hope might actually win are melody and pomsky. You'll notice all of your list is lumped under

-

There are many more eDSLs, but I will not list them as they are less relevant in my opinion

I honestly just don't believe you can solve the problem I'm trying to solve for just a single language, you have to solve it for editor find&replace, for log monitoring system configuration, for database queries, etc'.

What would need to be done before kleenexp is the sole winner of your research? Wondering ifs my dev roadmap lines up with your needs :)

On Wed, Oct 5, 2022, 07:32 mikaelho @.***> wrote:

Makes sense.

Your use of the word ”launch” and the stated goals of displacing re made me want to ask if you saw this list https://github.com/mikaelho/python-human-regex, also known as ”the competition”?

— Reply to this email directly, view it on GitHub https://github.com/SonOfLilit/kleenexp/issues/30#issuecomment-1267925957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADU7I6ZO5JFXVEE633ELLWBUAFHANCNFSM6AAAAAAQQIZ2LE . You are receiving this because you commented.Message ID: @.***>