YottaDB / YDB

Mirrored from https://gitlab.com/YottaDB/DB/YDB
Other
76 stars 37 forks source link

funny return value from External Call (mixing Simple API & Call-Ins) #351

Closed pkoper closed 5 years ago

pkoper commented 5 years ago

Sometimes s rc=$&func() sets rc to a value that is shown in ZWR as 0 but is undefined.

Code:

call()
  s rc=$&func()
  w "-- zwr shows rc=0 --",!
  zwr
  w "-- $g(rc) --",!
  w "rc=",$g(rc),!
  w "-- accessing rc --",!
  w "rc=",rc,!
  q

sometimes generates:

-- zwr shows rc=0 --
rc=0
result="ok"
-- $g(rc) --
rc=
-- accessing rc --
rc=Error occurred: 150373850,call+7^test,%YDB-E-LVUNDEF, Undefined local variable: rc

instead of:

-- zwr shows rc=0 --
rc=0
result="ok"
-- $g(rc) --
rc=0
-- accessing rc --
rc=0

I've tried to isolate the problem as far as possible. The smallest possible code to reproduce it is here: haisenbug.zip

This is kind of heisenbug - e.g. when you change s b=c to s a=c in test.m it disappears.

My environment is:

Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux
Distributor ID: Debian
Description:    Debian GNU/Linux 9.5 (stretch)
Release:        9.5
Codename:       stretch

YottaDB r1.22 Linux x86_64

nars1 commented 5 years ago

I am not able to reproduce your variable output (instead I get a SIG-11 and a core file always with a debug build and the correct output always with a production build). But I noticed the call-in table for test1 specifies an input parameter but no such parameter is supplied in the actual ydb_ci call in prog.c.

callin.tab
-----------
test1:  void test1^test(I:gtm_char_t*)

prog.c
-------
  ydb_ci("test1");

Once I fix ydb_ci("test1") to say ydb_ci("test1","abcd"), I do not see the SIG-11 any more.

Can you try this out in your environment to see if your variable output also goes away with this change?

It would be still better to avoid the SIG-11 in the debug build in case the actual # of parameters specified differ from those specified in the call-in table but I will create a separate issue (on gitlab which is where we maintain our repository now) for that once you confirm the above.

nars1 commented 5 years ago

In thinking about it some more, I realize I cannot create a separate issue if the # of actual parameters specified in ydb_ci() call differ from that specified in the call-in table. There is no way to know the actual # of parameters specified without changing the ydb_ci() interface. Additionally, there is already documentation that indicates the parameters and types should match (at https://docs.yottadb.com/ProgrammersGuide/extrout.html#call-in-interface).

Optional list of arguments to be passed to the M routine’s formallist: the number of arguments and the type of each argument must match the number of parameters, and parameter types specified in the corresponding Call-In table entry. All pointer arguments must be pre-allocated. YottaDB assumes that any pointer, which is passed for O/IO-parameter points to valid write-able memory.

We will document that ydb_ci() can cause SIG-11s or weird-symptoms/non-deterministic-output (like what you are seeing) if the correct # of parameters are not specified.

pkoper commented 5 years ago

I have the same problem for ydb_ci("test1"), ydb_ci("test1","abcd") and void test1^test() + ydb_ci("test1"). I'll try to replicate this on some virtual machine image. This problem is not connected with wrong number of arguments.

I think SIGSEGV on wrong number of arguments (or wrong types of arguments) for ydb_ci(p) is acceptable. Foreign Function Interfaces in other solutions work the same way. You can try to detect wrong number of arguments (or types) if you replace ydb_ci() with a macro with a sentinel:

#define SENTINEL 0x7077ADB
#define ydb_ci(name, ...)  ydb_ci_sentinel(name, __VA_ARGS_, SENTINEL)

You know from callin.tab where to expect SENTINEL in ydb_ci_sentinel() arguments. If it isn't there, the wrong number of arguments was passed to ydb_ci(). (you should keep old ydb_ci() function for ABI backward compatibility)

pkoper commented 5 years ago

Maybe it's a compiler issue - I have gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) on Debian 9.5 (stretch).

pkoper commented 5 years ago

I have reproduced the problem on Debian 9 image from osboxes.org (SourceForge download link) with YottaDB r1.22 Linux x86_64.

nars1 commented 5 years ago

Thanks. While downloading from the Sourceforge link, I get the below error. Any ideas how to work around this?

503 No Backend Servers Available No backend servers were available to answer your request. Please notify the site admins.

pkoper commented 5 years ago

Maybe choosing another mirror will help you - this option is hidden behind "Problems downloading?" button.

nars1 commented 5 years ago

The download started fine now. Maybe the site was down for a maintenance window previously. It is a 1GB file so it will take a few minutes to finish. Will keep you posted.

nars1 commented 5 years ago

I am now logged into the VM. Is How do I run your test case steps? Do I install r122 and your test programs to verify this or are they pre-installed on this VM?

pkoper commented 5 years ago

You have to install YottaDB and test programs on your own.

pkoper commented 5 years ago

The exact instructions to reproduce the problem (with and without additional argument for ydb_ci("test1")).

# login as osboxes/osboxes.org
su -    # osboxes.org
vi /etc/apt/sources.list  # add:  deb http://deb.debian.org/debian/ stretch main contrib
apt-get update
apt-get install make gcc
wget https://gitlab.com/YottaDB/DB/YDB/raw/master/sr_unix/ydbinstall.sh
chmod +x ydbinstall.sh
./ydbinstall --installdir /opt/ydb --utf8 default --force-install
logout  # switch back to osboxes user
wget https://github.com/YottaDB/YDB/files/2728529/haisenbug.zip
unzip haisenbug.zip
. /opt/ydb/gtmprofile
make run
vi prog.c  # change ydbci("test1"); to ydbci("test1","abcd");
make run
nars1 commented 5 years ago

Thanks. Interestingly, I was able to get a SIG-11 today with the additional argument for ydb_ci("test1") with a debug build. I still am not able to get the UNDEF error in a pro build like you do but suspect the two are related so will first investigate this.

pkoper commented 5 years ago

I have also reproduced the problem using YottaDB Docker container:

unzip haisenbug.tgz
vi prog.c  # replace ydb_ci("test1"); with ydb_ci("test1","abcd");
docker pull yottadb/yottadb
docker run -it -v `pwd`:/data yottadb/yottadb
zsy "bash"
make run

My host system is Debian 9.5 (stretch)

nars1 commented 5 years ago

Thanks. Will confirm the fix on the docker container once I get there. It is an issue with creating new variable names using the Simple API while inside an external call. I am still investigating but know enough to suggest a workaround. In your case, the "result" variable which is set inside the ydb_set_s() call is created for the very first time inside the ydb_set_s(). If you create the variable by say doing a set result="abcd" before doing the external call (as shown below), I think you will avoid the UNDEF error. Can you confirm this.

test.m

call()
  set result="abcd"
  s rc=$&func()
pkoper commented 5 years ago

If I remove ydb_ci("echo") from prog.c the problem disappears. If I create a single letter variable x, the problem still exists:

call()
  w "-- zwr shows nothing --",!
  zwr
  s x="abcd"
  w "-- zwr shows x=""abcd"" --",!
  zwr
  s rc=$&func()
  w "-- zwr shows rc=0 --",!
  zwr
  w "-- $g(rc) --",!
  w "rc=",$g(rc),!
  w "-- accessing rc --",!
  w "rc=",rc,!
  q

outputs:

-- zwr shows nothing --
-- zwr shows x="abcd" --
x="abcd"
-- zwr shows rc=0 --
rc=0
result="ok"
x="abcd"
-- $g(rc) --
rc=
-- accessing rc --
rc=Error occurred: 150373850,call+12^test,%YDB-E-LVUNDEF, Undefined local variable: rc

But if I replace set x="abcd" with set y="abcd" the problem disappears. You can create a variable with any longer name and the problem disappears. If you replace YDB_BUFFER_STR("result") with YDB_BUFFER_STR("x") in plugin.c and use set x="abcd" instead of set result="abcd" the problem still exists. Therefore I don't think it would be a good workaround.

nars1 commented 5 years ago

Not surprisingly, I am not able to reproduce the UNDEF with set x="abcd" either.

That aside, there are two UNDEF errors happening in the first two ydb_ci() calls ("v" in test1 and "c" in echo). Those look like application errors to me.

test1(a)
  s x=v
  q

echo(a)
  w "zyre=",$zyre,!
  s b=c
  q

Can you avoid those errors (or ensure they are handled by setting a $etrap to clear $ecode to "" etc.) and see if the problem still exists.

pkoper commented 5 years ago

Thank you for the suggestion on exception handling. The haisenbug.tgz sample is based on a larger C+M code base, where the UNDEF problem is frequent and no M exceptions are raised (I believe). Please, take haisenbug.tgz only as a carefully prepared sample to reproduce the problem. Call-In+External Call scenario in my code base does not match the sample.

nars1 commented 5 years ago

Sure. Will let you know when I have an update on the issue raised by your sample program.

nars1 commented 5 years ago

I have a potential fix for your issue. Would you be able to build from source to verify it works?

pkoper commented 5 years ago

Sure.

nars1 commented 5 years ago

Great. Go to https://gitlab.com/YottaDB/DB/YDB/merge_requests/499. You can clik on the "Check out branch" link in that url and that should download the git repo for you. Let me know if you have any questions. Note that this fixes the SIG-11 I am seeing. I don't yet know if there are more issues since I don't yet see the UNDEF that you see.

nars1 commented 5 years ago

Also can you try running your test with r1.22 after setting the ydb__dbglvl (or gtmdbglvl) environment variable to the value 64 (this turns on an extra memory verification step). I expect you to see a SIG-11 just like I do in the debug build. The UNDEF, in my understanding, is a symptom of how uninitialized memory (which was not correctly initialized in case of a hashtable expansion when a new M local variable name gets added while inside an external call) acts up differently on different systems.

pkoper commented 5 years ago

I have compiled nars1/YDB-sig11 branch:

$ git branch
  master
* nars1/YDB-sig11
$

But I still get UNDEF:

$ make run
gcc -Wall -o prog prog.c -fPIC -L/opt/ydb-351 -lyottadb -Wl,-rpath,/opt/ydb-351 -I/opt/ydb-351
gcc -Wall -shared -o plugin.so plugin.c -fPIC -L/opt/ydb-351 -lyottadb -I/opt/ydb-351
./prog
zyre=YottaDB r1.23 Linux x86_64
Error occurred: 150373850,echo+2^test,%YDB-E-LVUNDEF, Undefined local variable: c
Error occurred: 150373850,test1+1^test,%YDB-E-LVUNDEF, Undefined local variable: v
-- zwr shows rc=0 --
rc=0
result="ok"
-- $g(rc) --
rc=
-- accessing rc --
rc=Error occurred: 150373850,call+7^test,%YDB-E-LVUNDEF, Undefined local variable: rc
$

Other YottaDB installations in my Debian 9 system are r1.22 (not r1.23 as in output above), so I think the issue is still not resolved.

nars1 commented 5 years ago

Thanks. I will try to reproduce the issue in the Docker container like you had mentioned before.

pkoper commented 5 years ago

With gtmdbglvl=64 on r1.22 (not r1.23) I have core dump:

Program terminated with signal SIGQUIT, Quit.
#0  0x00007f9f25844227 in kill () at ../sysdeps/unix/syscall-template.S:78
78      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007f9f25844227 in kill () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007f9f25a25e68 in gtm_dump_core () from /opt/ydb/libyottadb.so
#2  0x00007f9f25a27a8d in gtm_fork_n_core () from /opt/ydb/libyottadb.so
#3  0x00007f9f25a26562 in generic_signal_handler () from /opt/ydb/libyottadb.so
#4  <signal handler called>
#5  0x00007f9f25b94146 in op_fnget () from /opt/ydb/libyottadb.so
#6  0x00007f9f259dc5e8 in ?? ()
#7  0x000000000000000c in ?? ()
#8  0x00007f9f260df060 in ?? () from /lib64/ld-linux-x86-64.so.2
#9  0x00007f9f24dedda0 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#10 0x00007ffe47ea68e0 in ?? ()
#11 0x00007f9f260e0170 in audit_list_string ()
#12 0x00007f9f25940b9d in __GI___dl_iterate_phdr (callback=0x559ffb76ab60, data=0x559ffb76ab90) at dl-iteratephdr.c:82
#13 0x0000000000000000 in ?? ()
(gdb)

With gtmdbglvl=64 on r1.23 (nars1/YDB-sig11) I have core dump:

Program terminated with signal SIGQUIT, Quit.
#0  __pthread_kill (threadid=<optimized out>, signo=3) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
57      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=3) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x00007f610c58c468 in gtm_dump_core () at /root/ydb/sig11/YDB/sr_unix/gtm_dump_core.c:72
#2  0x00007f610c58c780 in gtm_fork_n_core () at /root/ydb/sig11/YDB/sr_unix/gtm_fork_n_core.c:148
#3  0x00007f610c58f72f in generic_signal_handler (sig=<optimized out>, info=<optimized out>, context=<optimized out>)
    at /root/ydb/sig11/YDB/sr_unix/generic_signal_handler.c:353
#4  <signal handler called>
#5  op_fnget () at /root/ydb/sig11/YDB/sr_x86_64/op_fnget.s:41
#6  0x00007f610c27b5e8 in ?? ()
#7  0x00007f610c7429d0 in ?? ()
#8  0x00007ffe4dfbb370 in ?? ()
#9  0x000000000000000e in ?? ()
#10 0x00007f610c76c060 in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x00007f610b852da0 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#12 0x00007ffe4dfbb400 in ?? ()
#13 0x00007f610c76d170 in audit_list_string ()
#14 0x00007f610c1dfb9d in __GI___dl_iterate_phdr (callback=0x56284ba3eb80, data=0x56284ba3ebb0) at dl-iteratephdr.c:82
#15 0x0000000000000000 in ?? ()
(gdb)
nars1 commented 5 years ago

You would have gotten a KILLBYSIGSINFO1 message in the output. Can you paste the full output of that for both r1.22 and r1.23.

pkoper commented 5 years ago

r1.22:

%YDB-F-KILLBYSIGSINFO1, YottaDB process 24448 has been killed by a signal 11 at address 0x00007F8BCDD40146 (vaddr 0x0000000000000000)

r1.23 (nars1/YDB-sig11):

%YDB-F-KILLBYSIGSINFO1, YottaDB process 24514 has been killed by a signal 11 at address 0x00007FD5697E66C6 (vaddr 0x0000000000000000)
nars1 commented 5 years ago

Progress. I am able to reproduce the UNDEF issue with the docker container.

nars1 commented 5 years ago

But I am not able to reproduce the UNDEF with the fixed r1.23 version. What is the output of the following in your r1.23 environment.

ldd prog
env | grep LD_LIBRARY_PATH
ls -l $ydb_dist/libyottadb.so
pkoper commented 5 years ago

Updated Makefile:

export GTMCI=$(CURDIR)/callin.tab
export GTMXC=$(CURDIR)/extcall.tab
export plugin=$(CURDIR)/plugin.so
export gtmroutines=
export ydb_routines=$(CURDIR)($(CURDIR))
export gtmdbglvl=64

all: prog plugin.so

prog: prog.c
        gcc -Wall -o prog prog.c -fPIC -L$(ydb_dist) -lyottadb -Wl,-rpath,$(ydb_dist) -I$(ydb_dist)

plugin.so: plugin.c
        gcc -Wall -shared -o plugin.so plugin.c -fPIC -L$(ydb_dist) -lyottadb -I$(ydb_dist)

run: all
        set | grep dbglvl
        ldd prog
        strings /opt/ydb-351/libyottadb.so | grep 'YottaDB r1'
        set | grep LD_LIBRARY_PATH
        ./prog

clean:
        rm -f prog plugin.so test.o

Output from make run:

set | grep dbglvl
gtmdbglvl='64'
ldd prog
        linux-vdso.so.1 (0x00007fffe1568000)
        libyottadb.so => /opt/ydb-351/libyottadb.so (0x00007f489f28f000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f489f0c0000)
        libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f489eea8000)
        libncurses.so.6 => /lib/x86_64-linux-gnu/libncurses.so.6 (0x00007f489ee7f000)
        libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f489ee51000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f489ecbd000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f489ecb6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f489ec95000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f489ec8b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f489f959000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f489ea71000)
strings /opt/ydb-351/libyottadb.so | grep 'YottaDB r1'
YottaDB r1.23 Linux x86_64
set | grep LD_LIBRARY_PATH
LD_LIBRARY_PATH='/opt/ydb-351:'
./prog
zyre=YottaDB r1.23 Linux x86_64
Error occurred: 150373850,echo+2^test,%YDB-E-LVUNDEF, Undefined local variable: c
Error occurred: 150373850,test1+1^test,%YDB-E-LVUNDEF, Undefined local variable: v
-- zwr shows rc=0 --
rc=0
result="ok"
-- $g(rc) --
rc=
%YDB-F-KILLBYSIGSINFO1, YottaDB process 25172 has been killed by a signal 11 at address 0x00007FF26DB6C6C6 (vaddr 0x0000000000000000)
nars1 commented 5 years ago

Thanks. This keeps being elusive. I tried various combinations in the docker container. And was able to see the UNDEF or SIG-11 reliably without the fix and do not see either with the fix. Are you able to reproduce the SIG-11 in a docker container with r1.23?

nars1 commented 5 years ago

I have the binaries with the fix (just to eliminate the rare possibility that the build you are using as r1.23 does not contain the fixes). Can you please download this and try this out for your experiment. Once you install this, I expect you to see the following output. That will confirm you are using the right binaries.

YDB>write $zreldate
20190106 16:24

yottadb_r123_linux_x8664_pro.tar.gz

pkoper commented 5 years ago

Great! The UNDEF problem has disappeared and I'm not able to reproduce the issue with the sample, thank you.

I see that in ydb_subscript_next_s() started to return YDB_ERR_NODEEND instead of YDB_OK when there are no more subscripts, so I cannot check my code base right now (without refactoring).

pkoper commented 5 years ago

I have checked my code base, everything is ok now :) Is your build suitable for production use?

nars1 commented 5 years ago

Thanks for reporting the issue confirming the fix works. https://gitlab.com/YottaDB/DB/YDB/issues/401 has been created on gitlab (our official issue tracker) for this. Regarding use in production, I cannot guarantee that as this fix has not gone through our in-house testing. Given you are planning to use this in production, I would like to encourage you to consider buying YottaDB support (you can send email to support@yottadb.com).