SWI-Prolog / contrib-protobufs

An interface to Google Protocol Buffers (protobuf)
8 stars 7 forks source link

integer_zigzag can't handle large values #5

Closed kamahen closed 3 years ago

kamahen commented 3 years ago
integer_zigzag(A, 18446744073709551600).

ERROR: Arguments are not sufficiently instantiated

(The value is 0xfffffffffffffff0.)

kamahen commented 3 years ago

Similar problem with int32_codes(4294967295, Z)

kamahen commented 3 years ago

protobufs:fixed_int32//1 and protobufs:fixed_int64//1 have a work-around for this bug:

fixed_int32(X, [A0, A1, A2, A3 | Rest], Rest) :-
    (   nonvar(X), X > 0x7fffffff
    ->  X2 is -(X xor 0xffffffff + 1)
    ;   X2 = X
    ),
    int32_codes(X2, [A0, A1, A2, A3]).
fixed_int64(X, [A0, A1, A2, A3, A4, A5, A6, A7 | Rest], Rest) :-
    (   nonvar(X), X > 0x7fffffffffffffff
    ->  X2 is -(X xor 0xffffffffffffffff + 1)
    ;   X2 = X
    ),
    int64_codes(X2, [A0, A1, A2, A3, A4, A5, A6, A7]).
kamahen commented 3 years ago

The problem turns out to be that SWI-Prolog doesn't handle large 64-bit integers as 64-bits, but switches to GNU multiple precision arithmetic library (GMP) when larger than the value in max_tagged_integer, which is 72057594037927935 or 0x00ffffffffffffff. The C code treats failure of PL_get_int64 as instantiation_error, which isn't true for large numbers (it also does this for atoms, for example).

The solution seems to be: use float32_codes and float64_codes in C but rewrite the others using pure Prolog. E.g. (needs testing):

int32_codes(X, Codes) :-
    X =<  2147483647,  %  0x7fffffff
    X >= -2147483648,  % -0x7fffffff - 1 or 0x80000000 in 2s complement
    Codes = [C0,C1,C2,C3],
    (   integer(X)
    ->  C0 is  X        /\ 0xff,
        C1 is (X >>  8) /\ 0xff,
        C2 is (X >> 16) /\ 0xff,
        C3 is (X >> 24) /\ 0xff
    ;   X0 is C0 + (C1 << 8) + (C2 << 16) + (C3 << 24),
        (   C3 >= 0x80
        ->  X is - X0 + 1
        ;   X = X0
        )
    ).

(Actually, 32-bit should work fine but 64-bit is the problem, as is zigzag)

JanWielemaker commented 3 years ago

No. SWI-Prolog has three internal integer types: tagged, int64_t and GMP. Thus, only unsigned 64 bit integers with the high bit set (those that are negative when interpreted as signed) use GMP. To the outside the C API provides functions to handle uint64_t, so the whole thing is transparent, provided the system is compiled with GMP support.

kamahen commented 3 years ago

By experiment, that's not what I saw. I'll make some test cases for you.

Anyway, it's easy enough to program in pure Prolog (the code's already done but I haven't finished all the test cases). (And it'd be nice if there were a getbyte predicate ...)

How do I access tagged int, int64_t, GMP? Presumably there are 3 different predicates but I could only find PL_get_int64() and PL_get_int64_ex()

On Mon, Jun 21, 2021, 23:17 Jan Wielemaker @.***> wrote:

No. SWI-Prolog has three internal integer types: tagged, int64_t and GMP. Thus, only unsigned 64 bit integers with the high bit set (those that are negative when interpreted as signed) use GMP. To the outside the C API provides functions to handle uint64_t, so the whole thing is transparent, provided the system is compiled with GMP support.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/SWI-Prolog/contrib-protobufs/issues/5#issuecomment-865629975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIGNNO6RVM3GIS5FQHE3ZLTUATHZANCNFSM46T4ZSUA .

JanWielemaker commented 3 years ago

PL_get_uint64() and friends are missing. Well, the _ex version exist in disguise as PL_cvt_i_uint64(term_t, uint64_t *). I'll add these for consistency. And no, there is no direct access to the internal types (as it should be).

edit Added PL_get_uint64() and PL_get_uint64_ex() API functions.

kamahen commented 3 years ago

@JanWielemaker Please add PL_get_unsigned(), PL_get_unsigned_ex(), PL_unify_unsigned() (or PL_get_uint() etc -- I'm not sure of your naming conventions)

Elsewhere, I noticed that PL_unify_list_ncodes() uses char* instead of unsigned char* ... presumably this doesn't cause any problems for char values greater than 128 on some platforms? Or is there a different interface that should be used?

JanWielemaker commented 3 years ago

PL_get_unsigned_ex() is PL_cvt_i_uint(). For PL_unify_unsigned() either cast to an int of a larger size or use pl_unify_uint64(). For unifying with strings use PL_unify_chars(). The string operations handle the input as unsigned chars regardless of the platform.

kamahen commented 3 years ago

This is fixed in the next release (https://github.com/SWI-Prolog/contrib-protobufs/pull/10)