ChenHuajun / pg_roaringbitmap

RoaringBitmap extension for PostgreSQL
Apache License 2.0
223 stars 37 forks source link

Doesn't work on 32-bit and big-endian platforms #31

Open df7cb opened 7 months ago

df7cb commented 7 months ago

Hi, I'm working on packaging pg_roaringbitmap for apt.postgresql.org and Debian. Naturally, that includes building it for a variety of different machine architectures.

pg_roaringbitmap works on little-endian 64-bit platforms only. If that's intended, that's fine, but maybe we can do better and support the rest as well.

32-bit x86

On 32-bit x86, the problem is actually quite small:

**** regression.diffs ****
--- /<<PKGBUILDDIR>>/expected/roaringbitmap.out 2022-06-28 18:19:36.000000000 +0000
+++ /<<PKGBUILDDIR>>/results/roaringbitmap.out  2024-03-19 14:05:37.720980272 +0000
@@ -815,13 +815,13 @@
 select rb_is_empty('{1}');
  rb_is_empty 
 -------------
- f
+ t
 (1 row)

 select rb_is_empty('{1,10,100}');
  rb_is_empty 
 -------------
- f
+ t
 (1 row)

 select rb_cardinality(NULL);

The other regression tests pass. (I tried poking around in the source, but only figured that return rb->size == 0; in roaring_buffer_is_empty yields the wrong result, not where the problem is.)

On 32-bit, there, is a warning that is easy to fix:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -std=c99 -Wno-error=maybe-uninitialized -Wno-declaration-after-statement -I. -I./ -I/usr/include/postgresql/10/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/mit-krb5  -c -o roaringbitmap.o roaringbitmap.c
In file included from roaringbitmap.c:1:
roaringbitmap.h: In function ‘pg_aligned_malloc’:
roaringbitmap.h:57:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   57 |     p = (void *)((((uint64)porg + alignment) / alignment) * alignment);
      |                    ^
roaringbitmap.h:57:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   57 |     p = (void *)((((uint64)porg + alignment) / alignment) * alignment);
      |         ^
roaringbitmap.h:58:47: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   58 |     *((unsigned char *)p-1) = (unsigned char)((uint64)p - (uint64)porg);
      |                                               ^
roaringbitmap.h:58:59: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   58 |     *((unsigned char *)p-1) = (unsigned char)((uint64)p - (uint64)porg);
      |                                                           ^
roaringbitmap.h: In function ‘pg_aligned_free’:
roaringbitmap.h:66:21: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   66 |     porg = (void *)((uint64)memblock - *((unsigned char *)memblock-1));
      |                     ^
roaringbitmap.h:66:12: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   66 |     porg = (void *)((uint64)memblock - *((unsigned char *)memblock-1));
      |            ^
roaringbitmap.h:68:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   68 |         porg = (void *)((uint64)porg - 256);
      |                         ^
roaringbitmap.h:68:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   68 |         porg = (void *)((uint64)porg - 256);
      |                ^

This can be fixed by using size_t instead of uint64. (I'll open a PR for that in a minute.)

big-endian s390x

On big-endian architectures, the problems are much worse:

**** regression.diffs ****
diff -U3 /<<PKGBUILDDIR>>/expected/roaringbitmap.out /<<PKGBUILDDIR>>/results/roaringbitmap.out
--- /<<PKGBUILDDIR>>/expected/roaringbitmap.out 2022-06-28 18:19:36.000000000 +0000
+++ /<<PKGBUILDDIR>>/results/roaringbitmap.out  2024-03-19 14:05:10.319049606 +0000
@@ -57,13 +57,13 @@
 select  '{}'::roaringbitmap;
    roaringbitmap    
 --------------------
- \x3a30000000000000
+ \x0000303a00000000
 (1 row)

 select  '{ -1 ,  2  , 555555 ,  -4  }'::roaringbitmap;
                                    roaringbitmap                                    
 ------------------------------------------------------------------------------------
- \x3a300000030000000000000008000000ffff01002000000022000000240000000200237afcffffff
+ \x0000303a000000030000000000080000ffff000100000020000000220000002400027a23fffcffff
 (1 row)

 set roaringbitmap.output_format='array';
@@ -74,17 +74,13 @@
 (1 row)

 select '\x3a30000000000000'::roaringbitmap;
- roaringbitmap 
----------------
- {}
-(1 row)
-
+ERROR:  bitmap format is error
+LINE 1: select '\x3a30000000000000'::roaringbitmap;
+               ^
 select '\x3a300000030000000000000008000000ffff01002000000022000000240000000200237afcffffff'::roaringbitmap;
-  roaringbitmap   
-------------------
- {2,555555,-4,-1}
-(1 row)
-
+ERROR:  bitmap format is error
+LINE 1: select '\x3a300000030000000000000008000000ffff01002000000022...
+               ^
 -- Exception
 select  ''::roaringbitmap;
 ERROR:  malformed bitmap literal
@@ -142,19 +138,19 @@
 select '{}'::roaringbitmap::bytea;
        bytea        
 --------------------
- \x3a30000000000000
+ \x0000303a00000000
 (1 row)

 select '{1}'::roaringbitmap::bytea;
                  bytea                  
 ----------------------------------------
- \x3a3000000100000000000000100000000100
+ \x0000303a0000000100000000000000100001
 (1 row)

 select '{1,9999}'::roaringbitmap::bytea;
                    bytea                    
 --------------------------------------------
- \x3a30000001000000000001001000000001000f27
+ \x0000303a0000000100000001000000100001270f
 (1 row)

 select '{}'::roaringbitmap::bytea::roaringbitmap;
@@ -1582,1092 +1578,7 @@
 (1 row)

 select rb_fill('{}',0,2);
- rb_fill 
----------
- {0,1}
-(1 row)

... and then the server crashes.

(Again, one possible fix here is to just document that big-endian would not be supported.)

Thanks for considering.

lemire commented 7 months ago

CRoaring supports 32-bit and even big endian systems (see https://github.com/RoaringBitmap/CRoaring/blob/master/.github/workflows/s390x.yml).

See our documentation:

Screenshot 2024-03-19 at 11 32 59 AM

If you need big-endian support, we invite you to contribute to the following issue regarding serialization: https://github.com/RoaringBitmap/CRoaring/issues/423

It is a simple matter of flipping the bytes when reading and writing.