SWI-Prolog / issues

Dummy repository for issue tracking
7 stars 3 forks source link

Unicode symbols, and a possible numerical paradox! #115

Closed Shyam-Has-Your-Anomaly-Mitigated closed 2 years ago

Shyam-Has-Your-Anomaly-Mitigated commented 2 years ago

/*
ЁЯЦЦ Hello World,

I'm studying Sanskrit at ANU; I'm trying to cheat, but some bugs ate my homework! (An idea crazy enough.....Artificial IntelligenceтАж)

This is a Unicode bug, and a numeric bug; it has to be iff it works for other alphabets, or I'm /illogical|innumerate/тАж

Maybe Raspberry Pi is outdated?
$ swipl --version
SWI-Prolog version 8.2.4 for armv7l-linux

Any compound combination needs to be singley quoted; I'm only interested in symbolic, so strings are тТЯevilтТо from the low-level depths of the bare-metal Home For Infinite Losers, and I assume `Symbolic = Atomic` Until-Otherwise-Informed (`rev <<< IOU # as in yoU Owe I an explanation iff you want peer-reviewed consensus`).
?- A = 'резреирей'([реж,рез,реи,рей,рек,рел,рем,рен,рео,реп]).
?- A = реж([реж,'резреирей',рек,рел,рем,рен,рео,реп]).
?- A = реж([реж,рез,реи,рей,рек,рел,рем,рен,рео,реп]).
?- A = реж(реж,'резреирей',рек,рел,рем,рен,рео,реп).
?- A = реж(реж,рез,реи,рей,рек,рел,рем,рен,рео,реп).
?- A = [реж,рез,реи,рей,рек,рел,рем,рен,рео,реп].
?- A = [реж,'резреирей',рек,рел,рем,рен,рео,реп].
?- A = рез.
These throw errors.
?- A = резреирей([реж,рез,реи,рей,рек,рел,рем,рен,рео,реп]).
?- A = реж([реж,резреирей,рек,рел,рем,рен,рео,реп]).
?- A = реж(реж,резреирей,рек,рел,рем,рен,рео,реп).
?- A = [реж,резреирей,рек,рел,рем,рен,рео,реп].
?- A = резреирей.
Please fix this for all Indic scripts; Sanskrit doesn't have it's own script, and we're only learning Devanagari тИ╡ Westerners learned Sanskrit in the North, before The Wall тИШ Ice тИз Fire Yajnas.

Another potential bug; either my code is wrong, or your numbers are broken.
Maybe there's a better way to do this, while you're at it? I understand why I did what I did, but my self-assesment is peak-a-booeing; I'm somehow unifying in a spookey quantum /entangling|teleporting/ sort of way (without QisKit).
*/
%?- A='режрезреирейрекрелремренреореп',number(A,B,C,[]),number(AA,B,CC,[]),number(AAA,BBB,CC,[]). % works
%?- A=режрезреирейрекрелремренреореп,number(A,B,C,[]),number(AA,B,CC,[]),number(AAA,BBB,CC,[]). % broken (see above)
%?- A='0123456789',number(A,B,C,[]),number(AA,B,CC,[]),number(AAA,BBB,CC,[]). % broken
%?- A=0123456789,number(A,B,C,[]),number(AA,B,CC,[]),number(AAA,BBB,CC,[]). % broken
number(N,[N]) --> numeral(N).
number(N,[N1|T]) --> {ground(N), atom_concat(N1,N2,N)}, numeral(N1), number(N2,T).
number(N,[H|T]) --> {\+(ground(N))}, numeral(H), number(N2,T), {atom_concat(H,N2,N)}.
numeral(N) --> [N], {numerals(NN), member(N,NN)}.
numerals([реж,рез,реи,рей,рек,рел,рем,рен,рео,реп]). % Alphabet.
numerals([0,1,2,3,4,5,6,7,8,9]). % Alphabet.
%numerals([0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f]). % Alphabet; this is kind of the whole point of this, to support arbitrary bases in any script (except for exotic scripts, like Roman; I haven't started yet, but in youtu.be/jMxoGqsmk5Y, Matt Parker says they stop at (s/-1+/+1├Ч/)3999, and the 3 looks like an M, and the 9s looke like рез #Trimurti).
%numerals([0,1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F']). % alphabet; this overly tedious life experience taught me lowercase is the way of religious zealots
/*
рдкреБрдирд░реН рдорд┐рд▓рд╛рдордГ ред рез ред (Roman = "punar mila╠Дmah╠г | i |" (it translates to  `рд╡рд▓рд░реН рдореЛрд░реНрдШреБрд▓рд┐рд╕реН ред реи ред` (Roman = "valar morghulis | ii |")))
рд╢реНрдпрд╛рдо рее рей рее (Roman = "s╠Бya╠Дma тАЦ iii тАЦ" (English = "shyam", but the english is pronounced "sharm"; notice the schwa /deletion|syncop(e|ation)/))
ps: i mentioned matt parker in my /first|last/ issue (github.com/swi-prolog/issues/issues/80); maybe i'll make a habitтАж
pss: sorry for not following the official steps; i did this in my terminal emulator, and i'd rather type set \xтВФlс╡ГtтВСx than markdown!
psss: i can't believe i stopped using prolog after `issue(80)`; all because i couldn't reliably plan any"-thing" over a centuryтА╜
*/
JanWielemaker commented 2 years ago

There doesn't seem such a thing as a bug. There can't be as Prolog syntax is undefined for non-ASCII :smile: Anyway, резреирей is a sequence of digits and the syntax for an unquoted atom is letter [letter | digit or "_"]*. A sequence of digits thus is not a valid atom. As being a letter or a digit is derived from the Unicode tables, this holds for any script.

Prolog only defines semantics for Latin digits, parsing 0123 to the integer 123 (it can not be distinguished from 123, i.e., 0123 == 123 succeeds. A sensible point of discussion could be what we should do with sequences of digits of other scripts (nor preceded by a letter because this turn them into an atom). I guess there are three options

Might be better to discuss this on our Discourse forum

Shyam-Has-Your-Anomaly-Mitigated commented 2 years ago

i traced the bug to atom_concat/3 treating numbers as atoms, but member/2 doesn't; so `numerals(['0','1','2','3','4','5','6','7','8','9']).` fixes my second bug, and i guess i'll just singley quote exotic numerals to fix my first bug, and maybe use macros to sugar the single quotes away (time to `dcg(prolog)`!!! :D); but i think atom_concat/3, and member/2, should agree, unless it's specified? where is your source? i imagine it's iso? i'll start datamining!
JanWielemaker commented 2 years ago

but i think atom_concat/3, and member/2, should agree

They do different things. atom_concat/3 defines the concatenation relation between three atoms. According to ISO, all arguments are variables or atoms. SWI-Prolog is a bit more relaxed, accepting also integers as input. The output arguments are always unified to atoms. This allows for e.g. atom_concat(a,42,X), which is quite useful. Member is based on unification and 1 = '1' is never true in Prolog.

Notice that atom_concat(X,Y,0123) gives X='', Y = '123' as the 0 is lost in the translation to an integer.

Unfortunately macros won't help much as they manipulate terms (after the parser), so they cannot make anything legal syntax. They can only change the term.

Closing as this is not a bug. Quite likely some people on the forum are interested in these complicated script issues :smile:

JanWielemaker commented 2 years ago

This issue has been mentioned on SWI-Prolog. There might be relevant details there:

https://swi-prolog.discourse.group/t/unicode-symbols-and-a-possible-numerical-paradox/5533/1