EnterpriseDB / mysql_fdw

PostgreSQL foreign data wrapper for MySQL
Other
521 stars 160 forks source link

Fix regression tests for different fetch size on 32-bit platforms #227

Open df7cb opened 2 years ago

df7cb commented 2 years ago

Net diff between server_options.out and server_options_1.out is:

--- expected/server_options.out 2021-10-01 16:19:20.320972814 +0200 +++ expected/server_options_1.out 2021-10-01 16:22:04.428838428 +0200 @@ -235,11 +235,11 @@

-- Negative test cases for fetch_size option, should error out. ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '-60000'); -ERROR: "fetch_size" requires an integer value between 1 to 18446744073709551615 +ERROR: "fetch_size" requires an integer value between 1 to 4294967295 ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '123abc'); -ERROR: "fetch_size" requires an integer value between 1 to 18446744073709551615 +ERROR: "fetch_size" requires an integer value between 1 to 4294967295 ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '999999999999999999999'); -ERROR: "fetch_size" requires an integer value between 1 to 18446744073709551615 +ERROR: "fetch_size" requires an integer value between 1 to 4294967295 -- Cleanup fetch_size test objects. DROP FOREIGN TABLE table30000; DROP SERVER fetch101;

df7cb commented 2 years ago

This fixes the regression tests on 32-bit platforms.

A more correct fix would be to make the max fetch size int64_max instead of INT_MAX, but that's more invasive.

jeevanladhe commented 2 years ago

Thanks, Christoph for your patch. But as of now, we do not test mysql_fdw on 32-bit platforms.

mysql_fdw runs as a part of the PostgreSQL or EDB Postgres Advanced Server. PostgresSQL server also actively tests only a couple of 32-bit platforms[1]. EDB Postgres Advanced Server does not support 32-bit platforms starting version 11 onwards[2].

We do not have any near-term plans to do active testing and certify mysql_fdw for 32-bit platforms. But, having said that, if we notice that a substantial user base is looking forward to the same then we may consider the same.

Regards, Jeevan Ladhe

[1] https://buildfarm.postgresql.org/cgi-bin/show_status.pl [2] https://www.enterprisedb.com/product-compatibility

df7cb commented 2 years ago

Even if you don't test it there, you could still merge it?

jeevanladhe commented 2 years ago

Even if you don't test it there, you could still merge it?

Well, that would fetch some additional maintenance every time we make changes or add some additional test to server_options.sql. Every time we do so, we will have to add the expected output to this new file as well, and will also have to make sure this test also passes. For doing that we will need to set up a 32-bit machine. This collectively indirectly implies we do half of the job if we would have to call it official support.

Regards, Jeevan Ladhe

df7cb commented 2 years ago

Alternatively, you could drop the offending test (the 3 fetch size errors), or fix the variable to always be 64 bit.

df7cb commented 2 years ago

Fwiw the 32-bit problem got worse with 2.7.0:

13:20:40 --- /tmp/autopkgtest.Hiip7V/tree/expected/aggregate_pushdown_4.out 2021-11-18 08:59:41.000000000 +0000
13:20:40 +++ /tmp/autopkgtest.Hiip7V/tree/results/aggregate_pushdown.out    2021-12-06 12:20:40.513243199 +0000
13:20:40 @@ -629,11 +629,8 @@
13:20:40  (7 rows)
13:20:40  
13:20:40  SELECT sum(c2) * (random() <= 1)::int AS sum FROM fdw132_t1 ORDER BY 1;
13:20:40 - sum 
13:20:40 ------
13:20:40 - 300
13:20:40 -(1 row)
13:20:40 -
13:20:40 +WARNING:  failed to resolve name __mulodi4
13:20:40 +ERROR:  failed to look up symbol "evalexpr_19_0"

That's on PG12 and 13 where the remaining tests pass; on PG14 it's even worse since the connection gets terminated because of a JIT error.

I haven't read the code yet, but this smells like a bug.

df7cb commented 2 years ago

The same regression diff appears on armhf with PG14:

diff -U3 /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/expected/aggregate_pushdown.out /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/results/aggregate_pushdown.out
--- /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/expected/aggregate_pushdown.out 2021-11-18 08:59:41.000000000 +0000
+++ /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/results/aggregate_pushdown.out  2021-12-08 22:29:30.341358246 +0000
@@ -629,11 +629,8 @@
 (7 rows)

 SELECT sum(c2) * (random() <= 1)::int AS sum FROM fdw132_t1 ORDER BY 1;
- sum 
------
- 300
-(1 row)
-
+WARNING:  failed to resolve name __mulodi4
+ERROR:  failed to look up symbol "evalexpr_19_0"
 -- LATERAL join, with parameterization
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT c2, sum FROM fdw132_t1 t1, lateral (SELECT sum(t2.c1 + t1.c1) sum FROM fdw132_t2 t2 GROUP BY t2.c1) qry WHERE t1.c2 * 2 = qry.sum and t1.c2 > 10 ORDER BY 1;
### End 14 installcheck (FAILED with exit code 1) ###
cpaelzer commented 2 years ago

I just wanted to state that (for obvious reasons) this affects Ubuntu just as much.

After reading the discussion so far I wanted to ask - as it seems armhf (or 32bit in general) shall not be supported - if you'd want us (= the Distributions) to disable building and providing mysql_fdw on armhf?

df7cb commented 2 years ago

"failed to resolve name" and "failed to look up symbol" are from PG's jit machinery.

https://github.com/postgres/postgres/blob/a18b6d2dc288dfa6e7905ede1d4462edd6a8af47/src/backend/jit/llvm/llvmjit.c#L1100-L1102

@anarazel, is that a bug in PG?

df7cb commented 2 years ago

https://codesearch.debian.net/search?q=mulodi4&literal=1 has some insights that this is a general problem with clang on 32-bit:

webkit2gtk_2.34.3-1/Source/WTF/wtf/CheckedArithmetic.h

/* On Linux with clang, libgcc is usually used instead of compiler-rt, and it does
 * not provide the __mulodi4 symbol used by clang for __builtin_mul_overflow
anarazel commented 2 years ago

@df7cb Hm. If you compile a c program using __builtin_mul_overflow on those platforms with clang, does it end up using libgcc? Does it make a difference if you add --rtlib=libgcc or --rtlib=compiler-rt to the clang invocations?

df7cb commented 2 years ago

It does not, but my test might be too simple:

[25] 10:08 cbe@sid-i386.lehmann:~/overflow 1j $ cat overflow.c 
int mul(int a, int b, int *c)
{
    __builtin_mul_overflow(a, b, c);
    return *c;
}

int main ()
{
    int a;
    mul(5, 5, &a);
    return a;
}
[0] 10:12 cbe@sid-i386.lehmann:~/overflow 1j $ clang --rtlib=libgcc -Wall -O2 overflow.c -lgcc
[0] 10:12 cbe@sid-i386.lehmann:~/overflow 1j $ ./a.out ; echo $?
25
[0] 10:12 cbe@sid-i386.lehmann:~/overflow 1j $ ldd a.out 
    linux-gate.so.1 (0xf7fb2000)
    libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7db2000)
    /lib/ld-linux.so.2 (0xf7fb4000)
[0] 10:12 cbe@sid-i386.lehmann:~/overflow 1j $ clang --version
Debian clang version 13.0.1-+rc1-1~exp4
Target: i386-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

--rtlib doesn't make a difference.

anarazel commented 2 years ago

@df7cb Oh, I thought this was arm specific... Do you see failures on 32bit x86?

I think the example fails to fail to generate a reference to mulodi only because a) the overflow path isn't actually used b) it's only 32bit.

$ cat /tmp/test_mul.c
#include <stdint.h>
#include <stdbool.h>

bool mul(int64_t a, int64_t b, int64_t *c)
{
  return __builtin_mul_overflow(a, b, c);
}

$ clang -m32 -Wall -O2 /tmp/test_mul.c -o - -S|grep mulodi
    calll   __mulodi4
df7cb commented 2 years ago

I've seen it on both i386 and armhf.

Your code now shows the problem in the 32-bit chroot:

[25] 10:51 cbe@sid-i386.lehmann:~/overflow 1j $ cat test_mul.c 
#include <stdint.h>
#include <stdbool.h>

bool mul(int64_t a, int64_t b, int64_t *c)
{
  return __builtin_mul_overflow(a, b, c);
}

int main()
{
    int64_t c;
    mul(5, 5, &c);
    return c;
}
[0] 10:51 cbe@sid-i386.lehmann:~/overflow 1j $ clang -m32 -Wall -O2 test_mul.c -c
[0] 10:52 cbe@sid-i386.lehmann:~/overflow 1j $ clang -Wall -O2 test_mul.o
/usr/bin/ld: test_mul.o: in function `mul':
test_mul.c:(.text+0x2f): undefined reference to `__mulodi4'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[1] 10:52 cbe@sid-i386.lehmann:~/overflow 1j $ clang --rtlib=libgcc -Wall -O2 test_mul.o
/usr/bin/ld: test_mul.o: in function `mul':
test_mul.c:(.text+0x2f): undefined reference to `__mulodi4'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[1] 10:52 cbe@sid-i386.lehmann:~/overflow 1j $ clang --rtlib=compiler-rt -Wall -O2 test_mul.o
[0] 10:52 cbe@sid-i386.lehmann:~/overflow 1j $ ldd a.out 
    linux-gate.so.1 (0xf7f7c000)
    libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7d7c000)
    /lib/ld-linux.so.2 (0xf7f7e000)
anarazel commented 2 years ago

It appears to be getting fixed in clang. See https://bugs.llvm.org/show_bug.cgi?id=28629 - sid's clang-14 doesn't fail anymore.

As the above example quite clearly shows, this is unrelated to postgres / jit...

df7cb commented 2 years ago

Thanks @anarazel !