Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 557 forks source link

Add valid_identifier_{pvn,sv} API functions #22769

Closed leonerd closed 1 week ago

leonerd commented 1 week ago

I've wanted to perform this kind of check from core code or XS modules for quite some time, but oddly we didn't appear to have an API for it until now.

There's likely places already in internal code that could be neatened by calling these functions. We can identify and fix those up later.

leonerd commented 1 week ago

I put these functions in toke.c just because they feel related to the Perl tokenizer - i.e. it's all about the rules that perl applies to perl source code. But there's no firm reason for them to live there, if elsewhere would be better.

leonerd commented 1 week ago

It also occurs to me that maybe identifiers should be restricted to every character being in the same script. This word pауpаl is not what you likely think. All but one letter in it is Cyrillic. This could be used in an attack of pull requests of malicious source code. We already have a function that checks that a string is all from the same script. It could be called just before the return true, or it could easily be changed to have an extra parameter which would be true if the string also needed to be a legal identifier

That could be a useful behaviour to have somewhere, perhaps with an optional flag. But my intention with this function was simply to match what the parser does, for purposes of using it to address this comment: https://github.com/Perl/perl5/pull/22765#issuecomment-2487367902

leonerd commented 1 week ago

Addressed the latest round of comments, and pushed again. Will need a squash before I merge it (I kept the fixes as separate commits to view them easier).

Leont commented 1 week ago

It sounds useful, except it's not clear to me when it would be useful. Do you have some examples?

tonycoz commented 1 week ago

It sounds useful, except it's not clear to me when it would be useful. Do you have some examples?

I don't know who or what you're responding to here.

nm

leonerd commented 1 week ago

It sounds useful, except it's not clear to me when it would be useful. Do you have some examples?

I'm about to add a call to it in the class field :reader attribute handler, to check that the requested method name is actually valid. Then the new :writer can do the same. Inspired by this PR comment - https://github.com/Perl/perl5/pull/22765#issuecomment-2487367902

khwilliamson commented 1 week ago

One set of my comments got lost. Having worked with these kinds of functions a lot, I believe the better API is to pass the end pointer instead of the length. That pointer typically must be calculated anyway, as this function now uses the length solely for that purpose, so just skip the intermediate value. I can give further reasons if this is not enough to convince you

leonerd commented 1 week ago

@khwilliamson Oh I see. Mm. I could add a _pve variant and put the real logic in there, and then have the _pvn and _sv versions just invoke that by calculations. I suspect having a _pvn in the API anyway would still be useful. It's still the expected pattern that the majority of existing functions all follow, so it would seem odd to not have that.

leonerd commented 1 week ago

@khwilliamson ^-- that latest commit then adds such a function.

leonerd commented 1 week ago

force-pushed to squash before merge