JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.89k stars 5.49k forks source link

`ispunct` differs from the function with the same name in C, could be worth a note #56680

Open inkydragon opened 4 days ago

inkydragon commented 4 days ago

Not sure if this is a bug. This seems intentional, but the ispunct in julia is inconsistent with the behavior of the C function of the same name, which is confusing.

Perhaps a warning could be added to clarify the inconsistency with C.

https://en.cppreference.com/w/cpp/string/byte/ispunct

julia> c = '+'
'+': ASCII/Unicode U+002B (category Sm: Symbol, math)

julia> ispunct(c)
false

julia> ( @ccall ispunct(c::Cchar)::Cint ) != 0
true

julia> c = '-'
'-': ASCII/Unicode U+002D (category Pd: Punctuation, dash)

julia> (ispunct(c), ( @ccall ispunct(c::Cchar)::Cint )!= 0)
(true, true)
  ispunct(c::AbstractChar) -> Bool

  Tests whether a character belongs to the Unicode general category Punctuation, i.e. a character whose category code
  begins with 'P'.

And more chars:

julia> for c in raw"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
           ispunct(c) || println(c)
       end
$
+
<
=
>
^
`
|
~
Seelengrab commented 3 days ago

I don't see an issue with the inconsistency - the julia function is explicit in its documentation that it handles Unicode punctuation (consistent with the domain of Char) while the C++ function is concerned with the C locale setting (which, as your reference points out, by default considers + to be punctuation). This may or may not be the same thing.

KristofferC commented 3 days ago

I don't think there was any claim that this was an issue, just that explicitly pointing out the discrepancy might be a good idea.

Seelengrab commented 3 days ago

I don't think there was any claim that this was an issue,

Well, Github calls this type of ticket an issue, so what else should I call it? 🤷

just that explicitly pointing out the discrepancy might be a good idea.

Yes, and I was agreeing with/reaffirming OP that the two functions aren't even intended to do the same thing. Why should we point out that a similar function in a different programming language has different behavior? Should we also clarify that eval only evaluates julia expressions, and not LISP expressions?

From reading the two documentations, it should already be plainly clear that the two have different behavior.

Seelengrab commented 3 days ago

This came out a bit wrong - what I mean is that IMO there shouldn't be any expectation that the two functions behave the same, given the already existing differences in documentation between the two. So I'm partly questioning what such an additional text would add 😅

gbaraldi commented 3 days ago

I think having the exact same name as the C function does make it nice to be clear to the user that they aren't the same specifically

Keno commented 3 days ago

julia> c = '-' '-': ASCII/Unicode U+002D (category Pd: Punctuation, dash)

Just pointing out that this is technically because that's not minus, but dash (although we do canonicalize them for julia input). That said, they are different characters:

julia> ispunct('−')
false
stevengj commented 2 days ago

I think it would make sense to comment on this explicitly in the ispunct docs.

Should be an easy PR if someone wants to take a stab.