Drup / ocaml-lmdb

Ocaml bindings for lmdb.
https://drup.github.io/ocaml-lmdb/dev/
MIT License
48 stars 3 forks source link

Segfault when using Cursor.go #15

Closed constfun closed 5 years ago

constfun commented 5 years ago

The following program segfaults in the middle of the cursor transaction. The resulting database can be dumped using the md_dump program with no issues. Pinned to master, on macOS, lmdb version 0.9.23:

let () =
  (* Create env and db *)
  let kibi = 8 * 1024 in
  let mibi = kibi * 1024 in
  let env = Lmdb.Env.create ~map_size:(10 * mibi) ~max_dbs:1 "testenv" in
  let t = Lmdb.Db.create ~create:true env "testdb" in
  (* Put some entries *)
  let rec put_count t = function
    | 0 -> ()
    | count ->
        let value_bytes = Bytes.make (10 * kibi) '1' in
        Lmdb.Db.put t (string_of_int count) (Bytes.to_string value_bytes) ;
        put_count t (count - 1)
  in
  let count = 250 in
  put_count t count ;
  assert ((Lmdb.Db.stats t).entries = count) ;
  (* Iterate using cursor and print keys *)
  Lmdb.Db.Cursor.go Lmdb.ro t (fun cur ->
      (* Triggering GC here also SEGFAULTs *) 
      (* Gc.full_major () ; *)
      let rec print_keys = function
        | 0 -> ()
        | count ->
            let key, _ = Lmdb.Db.Cursor.next cur in
            print_endline key ;
            print_keys (count - 1)
      in
      print_keys count )

Note: GC segfaulting in the code above.

I tried to get to the bottom of it, but wasn't successful.

My best guess is that the OCaml GC attempts to free the memory pointed to by the Bigarray created here: https://github.com/Drup/ocaml-lmdb/blob/2e163adbb5d23bee9c657f545d358b6ce8d39593/src/lmdb.ml#L373 Which would violate http://www.lmdb.tech/doc/group__mdb.html#ga8bf10cd91d3f3a83a34d04ce6b07992d As a hack, I've tried adding all values returned from this function to a list ref to prevent them from being collected, that didn't help though...

constfun commented 5 years ago

Crash dump:

Process:               cursor_test.exe [2232]
Path:                  /Users/USER/*/cursor_test.exe
Identifier:            cursor_test.exe
Version:               0
Code Type:             X86-64 (Native)
Parent Process:        zsh [1924]
Responsible:           cursor_test.exe [2232]
User ID:               501

Date/Time:             2019-02-03 14:34:20.999 -0500
OS Version:            Mac OS X 10.14.2 (18C54)
Report Version:        12
Anonymous UUID:        77738151-97CC-6E8E-DE97-A2E250A199CA

Sleep/Wake UUID:       A8BFE8DB-4280-429A-95DF-789F30DE4749

Time Awake Since Boot: 650000 seconds
Time Since Wake:       1600 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000010de1e00c
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [2232]

VM Regions Near 0x10de1e00c:
    __LINKEDIT             000000010ddd9000-000000010de02000 [  164K] r--/rwx SM=COW  /usr/lib/dyld
--> 
    MALLOC_TINY            00007fd3b4400000-00007fd3b4700000 [ 3072K] rw-/rwx SM=PRV  

Application Specific Information:
dyld2 mode

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   liblmdb.dylib                   0x000000010a7e3339 mdb_cursor_next + 182
1   liblmdb.dylib                   0x000000010a7e2ed3 mdb_cursor_get + 98
2   cursor_test.exe                 0x000000010a60a5e4 lmdb_stub__46_mdb_cursor_get + 36 (lmdb_cstubs.c:346)
3   cursor_test.exe                 0x000000010a58d646 camlLmdb_generated__fun_3087 + 86
4   cursor_test.exe                 0x000000010a59118d camlLmdb__get_prim_3047 + 125
5   cursor_test.exe                 0x000000010a5851e7 camlCursor_test__print_keys_1340 + 55
6   cursor_test.exe                 0x000000010a590ebf camlLmdb__fun_6027 + 127
7   cursor_test.exe                 0x000000010a58f16f camlLmdb__go_2563 + 319
8   cursor_test.exe                 0x000000010a58f258 camlLmdb__trivial_2576 + 136
9   cursor_test.exe                 0x000000010a585323 camlCursor_test__entry + 243
10  cursor_test.exe                 0x000000010a582669 caml_program + 1481
11  cursor_test.exe                 0x000000010a63b010 caml_start_program + 92
12  cursor_test.exe                 0x000000010a61c88c caml_startup_common + 524 (startup.c:157)
13  cursor_test.exe                 0x000000010a61c8fb caml_main + 11 (startup.c:167)
14  cursor_test.exe                 0x000000010a61c96c main + 12 (main.c:45)
15  libdyld.dylib                   0x00007fff6d42bed9 start + 1

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000008  rcx: 0x0000000000000000  rdx: 0x0000000000000001
  rdi: 0x00007fd3b4403510  rsi: 0x00007fd3b4600000  rbp: 0x00007ffee567e550  rsp: 0x00007ffee567e520
   r8: 0x000000010a7e398f   r9: 0x00007fd3b4600000  r10: 0x0000000000000001  r11: 0x000000010a6e6080
  r12: 0x00007fd3b4403510  r13: 0x000000010de1e000  r14: 0x00007fd3b4600000  r15: 0x00007fd3b4600010
  rip: 0x000000010a7e3339  rfl: 0x0000000000010246  cr2: 0x000000010de1e00c

Logical CPU:     4
Error Code:      0x00000004
Trap Number:     14

Binary Images:
       0x10a581000 -        0x10a661ff7 +cursor_test.exe (0) <D8B90ED9-0610-3EB4-B235-023E303573CF> /Users/USER/*/cursor_test.exe
       0x10a7de000 -        0x10a7ecfff +liblmdb.dylib (0) <58316010-045C-3CF4-9B40-519BBAD458A1> /usr/local/opt/lmdb/lib/liblmdb.dylib
       0x10a7f7000 -        0x10a7fbff7 +libffi.6.dylib (0) <986DC089-727D-3563-935B-652AB8C97396> /usr/local/opt/libffi/lib/libffi.6.dylib
       0x10dd21000 -        0x10dd9fa67  dyld (640.2) <289AB27E-F09F-3384-A14A-100431139559> /usr/lib/dyld
    0x7fff6a8a0000 -     0x7fff6a8a1ffb  libSystem.B.dylib (1252.200.5) <25F4A1F5-6551-312F-B397-C8D45ACC530A> /usr/lib/libSystem.B.dylib
    0x7fff6aafa000 -     0x7fff6ab51ff7  libc++.1.dylib (400.9.4) <B260AC33-EB9A-30C6-8746-D011B3B02B08> /usr/lib/libc++.1.dylib
    0x7fff6ab52000 -     0x7fff6ab67fff  libc++abi.dylib (400.17) <446F4748-8A89-3D2E-AE1C-27EEBE93A8AB> /usr/lib/libc++abi.dylib
    0x7fff6c347000 -     0x7fff6cacdfe7  libobjc.A.dylib (750.1) <804715F4-F52D-34D0-8FEC-A25DC08513C3> /usr/lib/libobjc.A.dylib
    0x7fff6d2ad000 -     0x7fff6d2b1ff3  libcache.dylib (81) <704331AC-E43D-343A-8C24-39201142AF27> /usr/lib/system/libcache.dylib
    0x7fff6d2b2000 -     0x7fff6d2bcff3  libcommonCrypto.dylib (60118.220.1) <9C865644-EE9A-3662-AB77-7C8A5E561784> /usr/lib/system/libcommonCrypto.dylib
    0x7fff6d2bd000 -     0x7fff6d2c4fff  libcompiler_rt.dylib (63.4) <817772E3-E836-3FFD-A39B-BDCD1C357221> /usr/lib/system/libcompiler_rt.dylib
    0x7fff6d2c5000 -     0x7fff6d2ceff3  libcopyfile.dylib (146.200.3) <5C5C4F35-DAB7-3CF1-940F-F47192AB8289> /usr/lib/system/libcopyfile.dylib
    0x7fff6d2cf000 -     0x7fff6d353fdf  libcorecrypto.dylib (602.230.1) <C78D1A87-5543-3561-BEB4-3B480BA94ECB> /usr/lib/system/libcorecrypto.dylib
    0x7fff6d3da000 -     0x7fff6d414ff7  libdispatch.dylib (1008.220.2) <2FDB1401-5119-3DF0-91F5-F4E105F00CD7> /usr/lib/system/libdispatch.dylib
    0x7fff6d415000 -     0x7fff6d444ff3  libdyld.dylib (640.2) <376E3F3A-6942-3B0E-AD5E-4B97E8255CF5> /usr/lib/system/libdyld.dylib
    0x7fff6d445000 -     0x7fff6d445ffb  libkeymgr.dylib (30) <A4EFD9A4-2EF3-3E18-B325-F527E3821939> /usr/lib/system/libkeymgr.dylib
    0x7fff6d453000 -     0x7fff6d453ff7  liblaunch.dylib (1336.220.5) <8563299C-2493-3DBD-8E88-3FC673DB47DD> /usr/lib/system/liblaunch.dylib
    0x7fff6d454000 -     0x7fff6d459fff  libmacho.dylib (921) <6ADB99F3-D142-3A0A-B3CE-031354766ACC> /usr/lib/system/libmacho.dylib
    0x7fff6d45a000 -     0x7fff6d45cffb  libquarantine.dylib (86.220.1) <58524FD7-63C5-38E0-9D90-845A79551C14> /usr/lib/system/libquarantine.dylib
    0x7fff6d45d000 -     0x7fff6d45eff3  libremovefile.dylib (45.200.2) <BA53CA8A-9974-3A43-9265-B110B1AE470F> /usr/lib/system/libremovefile.dylib
    0x7fff6d45f000 -     0x7fff6d476ff3  libsystem_asl.dylib (356.200.4) <33C62769-1242-3BC1-9459-13CBCDECC7FE> /usr/lib/system/libsystem_asl.dylib
    0x7fff6d477000 -     0x7fff6d477fff  libsystem_blocks.dylib (73) <152EDADF-7D94-35F2-89B7-E66DCD945BBA> /usr/lib/system/libsystem_blocks.dylib
    0x7fff6d478000 -     0x7fff6d500fff  libsystem_c.dylib (1272.200.26) <D6C701A2-9F17-308D-B6AC-9E17EF31B7DF> /usr/lib/system/libsystem_c.dylib
    0x7fff6d501000 -     0x7fff6d504ff7  libsystem_configuration.dylib (963.200.27) <94898525-ECC8-3CC9-B312-CBEAAC305E32> /usr/lib/system/libsystem_configuration.dylib
    0x7fff6d505000 -     0x7fff6d508ff7  libsystem_coreservices.dylib (66) <10818C17-70E1-328E-A3E3-C3EB81AEC590> /usr/lib/system/libsystem_coreservices.dylib
    0x7fff6d509000 -     0x7fff6d50fffb  libsystem_darwin.dylib (1272.200.26) <07468CF7-982F-37C4-83D0-D5E602A683AA> /usr/lib/system/libsystem_darwin.dylib
    0x7fff6d510000 -     0x7fff6d516ff7  libsystem_dnssd.dylib (878.230.2) <FF9D5025-F060-334B-B6D8-C5D0BB6A55E3> /usr/lib/system/libsystem_dnssd.dylib
    0x7fff6d517000 -     0x7fff6d563ff3  libsystem_info.dylib (517.200.9) <54B65F21-2E93-3579-9B72-6637A03245D9> /usr/lib/system/libsystem_info.dylib
    0x7fff6d564000 -     0x7fff6d58cff7  libsystem_kernel.dylib (4903.231.4) <ABDAABCA-C22A-3960-AA4E-E91A9FF34929> /usr/lib/system/libsystem_kernel.dylib
    0x7fff6d58d000 -     0x7fff6d5d8ff7  libsystem_m.dylib (3158.200.7) <AF25F8E8-194C-314F-A2D3-A424853EE796> /usr/lib/system/libsystem_m.dylib
    0x7fff6d5d9000 -     0x7fff6d5fdff7  libsystem_malloc.dylib (166.220.1) <4777DC06-F9C6-356E-82AB-86A1C6D62F3A> /usr/lib/system/libsystem_malloc.dylib
    0x7fff6d5fe000 -     0x7fff6d609ff3  libsystem_networkextension.dylib (767.220.1) <74818C3D-9B68-3823-A737-6A4B782618F2> /usr/lib/system/libsystem_networkextension.dylib
    0x7fff6d60a000 -     0x7fff6d611fff  libsystem_notify.dylib (172.200.21) <65B3061D-41D7-3485-B217-A861E05AD50B> /usr/lib/system/libsystem_notify.dylib
    0x7fff6d612000 -     0x7fff6d61bfef  libsystem_platform.dylib (177.200.16) <83DED753-51EC-3B8C-A98D-883A5184086B> /usr/lib/system/libsystem_platform.dylib
    0x7fff6d61c000 -     0x7fff6d626fff  libsystem_pthread.dylib (330.230.1) <BA382BFC-6A17-3940-B417-D090EF2AF4F4> /usr/lib/system/libsystem_pthread.dylib
    0x7fff6d627000 -     0x7fff6d62aff7  libsystem_sandbox.dylib (851.230.3) <4D0CB1CA-160C-3C29-BE5D-131D68D43B1B> /usr/lib/system/libsystem_sandbox.dylib
    0x7fff6d62b000 -     0x7fff6d62dff3  libsystem_secinit.dylib (30.220.1) <5964B6D2-19D4-3CF9-BDBC-4EB1D42348F1> /usr/lib/system/libsystem_secinit.dylib
    0x7fff6d62e000 -     0x7fff6d635ff7  libsystem_symptoms.dylib (820.237.2) <487E1794-4C6E-3B1B-9C55-95B1A5FF9B90> /usr/lib/system/libsystem_symptoms.dylib
    0x7fff6d636000 -     0x7fff6d64bff7  libsystem_trace.dylib (906.220.1) <4D4BA88A-FA32-379D-8860-33838723B35F> /usr/lib/system/libsystem_trace.dylib
    0x7fff6d64d000 -     0x7fff6d652ffb  libunwind.dylib (35.4) <EF1A77FD-A86B-39F5-ABEA-6100AB23583A> /usr/lib/system/libunwind.dylib
    0x7fff6d653000 -     0x7fff6d683fff  libxpc.dylib (1336.220.5) <DC50F33E-C47D-3256-BFE0-F8E9B5AEBE17> /usr/lib/system/libxpc.dylib

External Modification Summary:
  Calls made by other processes targeting this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by all processes on this machine:
    task_for_pid: 3403134
    thread_create: 0
    thread_set_state: 0

VM Region Summary:
ReadOnly portion of Libraries: Total=229.5M resident=0K(0%) swapped_out_or_unallocated=229.5M(100%)
Writable regions: Total=37.7M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=37.7M(100%)

                                VIRTUAL   REGION 
REGION TYPE                        SIZE    COUNT (non-coalesced) 
===========                     =======  ======= 
Kernel Alloc Once                    8K        2 
MALLOC                            29.3M       10 
MALLOC guard page                   16K        4 
STACK GUARD                       56.0M        2 
Stack                             8192K        2 
__DATA                            2992K       43 
__LINKEDIT                       216.8M        6 
__TEXT                            12.7M       42 
shared memory                        8K        3 
===========                     =======  ======= 
TOTAL                            325.7M      105 
Drup commented 5 years ago

hmm, @madroach ?

madroach commented 5 years ago

My best guess is that the OCaml GC attempts to free the memory pointed to by the Bigarray created here:

ocaml-lmdb/src/lmdb.ml

Line 373 in 2e163ad

As far as I can see ctypes doesn't set the CAML_BA_MANAGED flag in bigarray_of_ptr. So the GC should not try to free the pointer in the bigarray. I don't feel motivated to hunt this bug, but would rather propose to completely get rid of ctypes first.

constfun commented 5 years ago

Thank you for taking a look.

@madroach Out of curiosity, why get rid of ctypes and what alternative do you have in mind?

My tentative plan is to skip the high level interface and use the low level bindings directly...

madroach commented 5 years ago

why get rid of ctypes and what alternative do you have in mind?

Because of performance issues, difficulty to trace memory management issues like 6a37d7f4 and the one you reported. The alternative is #16. The would you like the bindings in #16 src/lmdb.ml module Mdb exported?