Closed leonardb closed 2 years ago
Hi @leonardb, glad to hear you're finding the project useful! This looks like an interesting one, I'll try to take a look sometime this week.
I looks like had forgotten deallocate the future in the resource destructor
Wonder if this would fix it https://github.com/apache/couchdb-erlfdb/pull/50 ?
@nickva I'll patch my local branch and test now
It seems to resolve the mutex issue.
Just ran into a bit of oddness when I tried to load it up a bit though. (using same funs from earlier)
11> [[begin K = integer_to_binary(X,32), erlfdb:transactional(Db, fun(Tx) -> erlfdb:wait(erlfdb:get(Tx, K)) end) end || X <- lists:seq(10,1000)] || _Y <- lists:seq(1,500)].
../include/internal/ethr_mutex.h:656: Fatal error in ethr_mutex_lock(): Invalid argument (22)
Aborted (core dumped)
@nickva Thank you for the quick patch. I've been running this on one of my nodes in production for 30 minutes to test and it looks good so far.
erlfdb_nif =>
#{binary => {124,132,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
driver_mutex => {117,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
driver_tid => {0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
drv_binary => {152,3547,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
nif_internal => {0,0,0,124,0,0,0,0,0,0,0,0,0,0,0,0,0,0}},
11> [[begin K = integer_to_binary(X,32), erlfdb:transactional(Db, fun(Tx) -> erlfdb:wait(erlfdb:get(Tx, K)) end) end || X <- lists:seq(10,1000)] || _Y <- lists:seq(1,500)]. ../include/internal/ethr_mutex.h:656: Fatal error in ethr_mutex_lock(): Invalid argument (22) Aborted (core dumped)
This is an interesting one. Looks like a use after free perhaps?. Is it easy to reproduce? Wonder if it is a resource reference counting bug: a destructor is called, we are deallocating, then some thread tries to lock it. The error comes from:
static ETHR_INLINE void
ETHR_INLINE_MTX_FUNC_NAME_(ethr_mutex_lock)(ethr_mutex *mtx)
{
int res = pthread_mutex_lock(&mtx->pt_mtx);
if (res != 0)
ETHR_FATAL_ERROR__(res);
}
@nickva While I could easily reproduce earlier, I no longer can, so guess it may have just been an artifact of a bad build which somehow caused an issue.
If I run into it again and can reproduce I'll open a separate issue.
@leonardb thanks for double-checking!
Merged the fix. If let us know @leonardb if you find any more issues!
Firstly, thanks for the bindings. I've been using them in production for quite a while without any serious issues, until now.
As our systems have scaled we noticed a marked increase in 'system' memory growth, until some servers were running out of memory. https://erlangforums.com/t/possible-memory-leak-with-a-nif/978/3
Digging a bit deeper and it looks like it may be an issue in the erlfdb_nif with a mutex not begin destroyed.
Using
instrument.allocations().
and 20 minutes later
I may be approaching this incorrectly, but I can replicate the increase in mutex counts just by performing transaction set operations and the mutex count does not ever decrease.