BlueBrain / nmodl

Code Generation Framework For NEURON MODeling Language
https://bluebrain.github.io/nmodl/
Apache License 2.0
48 stars 15 forks source link

Investigate segfault when calling `h.<function>_<suffix>` #1283

Open JCGoran opened 1 month ago

JCGoran commented 1 month ago

Below is a preliminary writeup about a segfault caused by calling h.<function>_<suffix> in a Python script.

Take the following mod file:

: minimal.mod
NEURON {
    SUFFIX minimal
}

FUNCTION f() {
    f = 1
}

Compiling it with NMODL works:

$ nrnivmodl -nmodl $(which nmodl) minimal.mod
[NMODL][warning] Code generation with NMODL is pre-alpha, lacks features and is intended only for development use
/Users/jelic/software/nmodl-clean/test/usecases/empty
cfiles =
Mod files: "minimal.mod"

Creating 'arm64' directory for .o files.

MODOBJS= ./minimal.o
 -> Compiling mod_func.cpp
 -> NMODL ../minimal.mod
 -> Compiling /arm64/minimal.cpp
 => LINKING shared library "/arm64/./libnrnmech.dylib"
 => LINKING executable "/arm64/./special" LDFLAGS are:
ld: warning: ignoring duplicate libraries: '-lnrnmech'
Successfully created arm64/special

Unfortunately, running the following Python script:

# sim.py
from neuron import h
s = h.Section()
s.insert("minimal")
h.f_minimal()

causes a segfault when running via nrniv sim.py (one can equivalently run with python sim.py, but then debugging is cumbersome). Running under the LLDB debugger reveals:

(lldb) run
Process 39954 launched: '/nrn/build-arm64/install/bin/nrniv' (arm64)
NEURON -- VERSION 9.0a-243-g30b42a1b8+ master (30b42a1b8+) 2024-05-14
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

loading membrane mechanisms from arm64/.libs/libnrnmech.so
Additional mechanisms from files
 "minimal.mod"
Process 39954 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001001dceb8 libnrnmech.so`double* neuron::cache::MechanismRange<1ul, 0ul>::data_array<0, 1>(this=0x000000016fdfdfd8, instance=0) at mechanism_range.hpp:87:26
   84       [[nodiscard]] double* data_array(std::size_t instance) {
   85           static_assert(variable < NumFloatingPointFields);
   86           // assert(array_size == m_data_array_dims[variable]);
-> 87           return std::next(m_data_ptrs[variable], array_size * (m_offset + instance));
   88       }
   89
   90       template <int variable, int array_size>
Target 0: (nrniv) stopped.

The full backtrace being:

  * frame #0: 0x00000001001dceb8 libnrnmech.so`double* neuron::cache::MechanismRange<1ul, 0ul>::data_array<0, 1>(this=0x000000016fdfdfd8, instance=0) at mechanism_range.hpp:87:26
    frame #1: 0x00000001001dce90 libnrnmech.so`double* neuron::cache::MechanismRange<1ul, 0ul>::fpfield_ptr<0>(this=0x000000016fdfdfd8) at mechanism_range.hpp:109:16
    frame #2: 0x00000001001dc99c libnrnmech.so`neuron::make_instance_minimal(_ml=0x000000016fdfdfd8) at minimal.cpp:102:26
    frame #3: 0x00000001001df8dc libnrnmech.so`neuron::_hoc_f() at minimal.cpp:182:21
    frame #4: 0x0000000101c47d30 libnrniv.dylib`hoc_call() at code.cpp:1418:9
    frame #5: 0x0000000101d2dae0 libnrniv.dylib`fcall(vself=0x0000000100af6b70, vargs=0x000000010011c040) at nrnpy_hoc.cpp:728:9
    frame #6: 0x0000000101b8a700 libnrniv.dylib`OcJump::fpycall(f=(libnrniv.dylib`fcall(void*, void*) at nrnpy_hoc.cpp:671), a=0x0000000100af6b70, b=0x000000010011c040) at ocjump.cpp:138:16
    frame #7: 0x0000000101d2cd98 libnrniv.dylib`hocobj_call(self=0x0000000100af6b70, args=0x000000010011c040, kwrds=0x0000000000000000) at nrnpy_hoc.cpp:796:45
    frame #8: 0x00000001003b180c Python`_PyObject_MakeTpCall + 132
    frame #9: 0x000000010048b57c Python`call_function + 268
    frame #10: 0x0000000100486124 Python`_PyEval_EvalFrameDefault + 22388
    frame #11: 0x000000010047fb6c Python`_PyEval_EvalCode + 416
    frame #12: 0x00000001004cc458 Python`run_eval_code_obj + 136
    frame #13: 0x00000001004cc388 Python`run_mod + 112
    frame #14: 0x00000001004cada8 Python`pyrun_file + 168
    frame #15: 0x00000001004ca7e4 Python`pyrun_simple_file + 252
    frame #16: 0x00000001004ca6a8 Python`PyRun_SimpleFileExFlags + 80
    frame #17: 0x0000000101d274c0 libnrniv.dylib`nrnpy_pyrun(fname="sim.py") at nrnpython.cpp:134:26
    frame #18: 0x0000000101c71f0c libnrniv.dylib`hoc_moreinput() at hoc.cpp:1133:14
    frame #19: 0x0000000101c71a08 libnrniv.dylib`hoc_main1(argc=2, argv=0x000000016fdfeed0, envp=0x000000016fdfeee8) at hoc.cpp:917:16
    frame #20: 0x00000001017d9784 libnrniv.dylib`ivocmain_session(argc=2, argv=0x000000016fdfeed0, env=0x000000016fdfeee8, start_session=1) at ivocmain.cpp:744:23
    frame #21: 0x00000001017d909c libnrniv.dylib`ivocmain(argc=2, argv=0x000000016fdfeed0, env=0x000000016fdfeee8) at ivocmain.cpp:349:12
    frame #22: 0x0000000100003b0c nrniv`main(argc=2, argv=0x000000016fdfeed0, env=0x000000016fdfeee8) at nrnmain.cpp:71:12

The issue seems to be that we are trying to dereference a nullptr:

(lldb) p m_data_ptrs
(double *const *) 0x0000000000000000

Going up a couple of frames:

(lldb) up
frame #1: 0x00000001001dce90 libnrnmech.so`double* neuron::cache::MechanismRange<1ul, 0ul>::fpfield_ptr<0>(this=0x000000016fdfdfd8) at mechanism_range.hpp:109:16
   106
   107      template <int variable>
   108      [[nodiscard]] double* fpfield_ptr() {
-> 109          return data_array<variable, 1>(0);
   110      }
   111
   112      /**
(lldb) up
frame #2: 0x00000001001dc99c libnrnmech.so`neuron::make_instance_minimal(_ml=0x000000016fdfdfd8) at minimal.cpp:102:26
   99
   100      static minimal_Instance make_instance_minimal(_nrn_mechanism_cache_range& _ml) {
   101          return minimal_Instance {
-> 102              _ml.template fpfield_ptr<0>()
   103          };
   104      }
   105
(lldb) up
frame #3: 0x00000001001df8dc libnrnmech.so`neuron::_hoc_f() at minimal.cpp:182:21
   179          _ppvar = _local_prop ? _nrn_mechanism_access_dparam(_local_prop) : nullptr;
   180          _thread = _extcall_thread.data();
   181          _nt = nrn_threads;
-> 182          auto inst = make_instance_minimal(_ml_real);
   183          _r = f_minimal(_ml, inst, id, _ppvar, _thread, _nt);
   184          hoc_retpushx(_r);
   185      }

Note that _ml_real doesn't have any data in it:

(lldb) p _ml_real
(_nrn_mechanism_cache_instance) {
  neuron::cache::MechanismRange<1, 0> = {
    m_data_ptrs = 0x0000000000000000
    m_data_array_dims = 0x0000000000000000
    m_pdata_ptrs = 0x0000000000000000
    m_offset = 18446744073709551615
  }
  m_dptr_cache = (__elems_ = "")
  m_dptr_datums = (__elems_ = "")
}

The entire definition of _hoc_f is as follows:

    static void _hoc_f(void) {
        double _r{};
        Datum* _ppvar;
        Datum* _thread;
        NrnThread* _nt;
        Prop* _local_prop = _prop_id ? _extcall_prop : nullptr;
        _nrn_mechanism_cache_instance _ml_real{_local_prop};
        auto* const _ml = &_ml_real;
        size_t const id{};
        _ppvar = _local_prop ? _nrn_mechanism_access_dparam(_local_prop) : nullptr;
        _thread = _extcall_thread.data();
        _nt = nrn_threads;
        auto inst = make_instance_minimal(_ml_real);
        _r = f_minimal(_ml, inst, id, _ppvar, _thread, _nt);
        hoc_retpushx(_r);
    }

Going down the rabbit hole, it seems _local_prop is a nullptr, and the call to:

        _nrn_mechanism_cache_instance _ml_real{_local_prop};

actually calls neuron::cache::MechanismInstance, which has this code snippet:

    MechanismInstance(Prop* prop)
        : base_type{_nrn_mechanism_get_type(prop), mechanism::_get::_current_row(prop)} {
        if (!prop) {
            // grrr...see cagkftab test where setdata is not called(?) and extcall_prop is null(?)
            return;
        }

This seems to originate from this NEURON commit, and is where I sort of lost track of what's going on.

Going back to the drawing board, we can instead call this Python script:

from neuron import h, gui
s = h.Section()
s.insert("minimal")
s(0.5).minimal.f() # <--- instead of `h.f_minimal()`

which doesn't segfault, so the HOC call doesn't work, but its Section equivalent does. Stopping at _npy_f (I guess the NEURON Python equivalent of _hoc_f?), we get:

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001001dfb7c libnrnmech.so`neuron::_npy_f(_prop=0x0000600002106080) at minimal.cpp:187:16
   184          hoc_retpushx(_r);
   185      }
   186      static double _npy_f(Prop* _prop) {
-> 187          double _r{};
   188          Datum* _ppvar;
   189          Datum* _thread;
   190          NrnThread* _nt;
(lldb) p _prop
(Prop *) 0x0000600002106080

which is not a nullptr, so maybe it has something to do with this?

nrnhines commented 1 month ago

I don't know if this will help but the same issue was very longstanding with nocmodl and was fixed in neuronsimulator/nrn#2460 Also see neuronsimulator/nrn#2475