Closed jb55 closed 2 weeks ago
I will start from the premise that most lack of formatting on push notifications have the same root cause across all types of notifications (not just repost notifications)
I am seeing two patterns of Ndb initialization error, usually when lmdb loads and memory maps the db file
I will experiment with setting a lower memory map size
- When Damus is closed, it usually fails with error code "12" which is "Cannot allocate memory". It's probably trying to allocate more memory than is allowed by the extension. Some sources say our memory limit for the extension is as little as 25 MB (https://stackoverflow.com/questions/50764003/what-memory-limit-for-apple-notifications-service-extension)
Now I am seeing some instances where it succeeds, and the mdb_env_open failed, error 12
error log appears 3 times before succeeding, which indicates it could allocate about 4GB at that time 🤔
I can't repro the memory error any longer, I will investigate the other one
we can try reducing the number of ingest threads to 1 or 0 to see if that fixes memory alloc issues, these are the only things that really allocate memory (for ingest queues).
oh wait 12 is an lmdb message, so maybe extensions can't do large virtual memory mapping? hmm.
on damus we have a special large virtual memory entitlement, should make sure that we have that on the extension as well.
I just read this in the LMDB docs:
int mdb_env_create(MDB_env **env);
/** @brief Open an environment handle.
*
* If this function fails, #mdb_env_close() must be called to discard the #MDB_env handle.
not sure if this is related, but we don't actually do that! So maybe subsequent smaller mapsize calls are failing when they don't need to (we keep reducing mapsize in a loop until it succeeds).
our default mapsize is 32GiB, which may be causing issues in the extension? We probably need it for larger databases though, so hopefully we can just turn on the same entitlement we use on the main app.
our default mapsize is 32GiB, which may be causing issues in the extension? We probably need it for larger databases though, so hopefully we can just turn on the same entitlement we use on the main app.
It tries to reduce mapsize. In some of my experiments, lmdb succeeded at mapsize ~= 4GB. That is larger than the minimum 700MB mapsize specified in the current code. In those experiments, the notification succeeded formatting. (This is when I stopped being able to repro formatting issues with the app closed).
on damus we have a special large virtual memory entitlement, should make sure that we have that on the extension as well.
Good call, however it does not seem to be available on the extension target. I see that capability option is available on the main app target, but it isn't there on the extension
I tried to closely debug what happens when the notification extension tries to initialize lmdb while the app is running (i.e. In this comment I am talking about the other source of formatting issues, the "error 1")
Here is what seems to happen (to the best of knowledge) during lmdb initialization (Most interesting things happen at mdb_env_setup_locks(MDB_env *env, MDB_name *fname, int mode, int *excl)
):
fcntl
with command F_SETLKW
) on ndb lockfileNULL
pointer (maybe it's just trying to do a locking test, not really trying to access that memory)EAGAIN
(Resource temporarily unavailable)sem_open
) with name at env->me_txns->mti_rmname
, appears to be a read semaphore@jb55, you mentioned (and I also read in some places) that lmdb allows a single-writer but multiple readers, and that you have seen notedeck functioning with multiple processes accessing NostrDB without some mutex or concurrency coordinator, however I have seen some evidence that seems to indicate that lmdb uses different concurrency coordination mechanisms on Darwin/XNU (i.e. macOS + iOS), FreeBSD, and Linux.
This file mdb.c
:
#if defined(__FreeBSD__) && defined(__FreeBSD_version) && __FreeBSD_version >= 1100110
# define MDB_USE_POSIX_MUTEX 1
# define MDB_USE_ROBUST 1
#elif defined(__APPLE__) || defined (BSD) || defined(__FreeBSD_kernel__)
# define MDB_USE_POSIX_SEM 1
# define MDB_FDATASYNC fsync
#elif defined(ANDROID)
# define MDB_FDATASYNC fsync
#endif
#ifndef _WIN32
#include <pthread.h>
#include <signal.h>
#ifdef MDB_USE_POSIX_SEM
# define MDB_USE_HASH 1
#include <semaphore.h>
#else
#define MDB_USE_POSIX_MUTEX 1
#endif
#endif
Indicates that Darwin/XNU/Apple platforms use MDB_USE_POSIX_SEM
as a coordination mechanism, whereas Linux (presumably?) uses MDB_USE_POSIX_MUTEX
. Depending on these constants, the lmdb init path seems to be very different.
Furthermore, I found this info on http://www.lmdb.tech/doc/index.html (Note: Not sure if this is official documentation):
On BSD systems or others configured with MDB_USE_POSIX_SEM, startup can fail due to semaphores owned by another userid.
Fix: Open and close the database as the user which owns the semaphores (likely last user) or as root, while no other process is using the database.
This message feels a bit ambiguous — Is the problem the user the process is running as, the fact that there are two processes accessing the database, or both? However, if I am interpreting this correctly, then it implies that two processes cannot access the same lmdb database at the same time on these platforms.
I am thinking of alternatives to this specific issue of NostrDB access while the app is open, and I think a good solution might be to implement some form of IPC between the app and the extension using stuff like Darwin Notifications (https://www.atomicbird.com/blog/sharing-with-app-extensions/), or some web API exposed on localhost? Perhaps the logic could be as follows:
What do you think?
I think a good solution might be to implement some form of IPC between the app and the extension
Alternatively we can move some of the formatting logic to the server-side as well
interesting investigation! this lead me to this comment by @hyc
https://github.com/agisboye/SwiftLMDB/issues/23#issuecomment-1518677356
IPC and POSIX Semaphores and Shared Memory
Normally, sandboxed apps cannot use Mach IPC, POSIX semaphores and shared memory, or UNIX domain sockets (usefully). However, by specifying an entitlement that requests membership in an application group, an app can use these technologies to communicate with other members of that application group.
...
@hyc: It sounds to me like if you want this to work you're going to need to patch your own builds of LMDB to include your application group name. With that 31-byte limit I don't even see much point to making this settable via LMDB API.
Sounds like we just need to patch the LMDB semaphore name to match our group container?
I wonder if this semaphore name change will allow us to not have to close the DB when it goes into the background as well. that would be amazing.
UPDATE: did not fix it unfortunately :( app still gets killed if the lockfile is present
@danieldaquino can you please try:
I've added the virtual memory entitlement to our notification extension so that you can open lmdb, I've also applied the group name to the semaphore in LMDB, hopefully that fixes weird issues with that.
I have the same PR for highlighter for testing:
Pushed a new internal testflight but it doesn't look like
did anything
Furthermore, I found this info on http://www.lmdb.tech/doc/index.html (Note: Not sure if this is official documentation):
It is, but it's not necessarily the most recent. The docs are generated by doxygen from comments embedded in the LMDB source, thus you can always find the most accurate docs in your own LMDB source tree. Mainly just by reading lmdb.h.
On BSD systems or others configured with MDB_USE_POSIX_SEM, startup can fail due to semaphores owned by another userid.
Fix: Open and close the database as the user which owns the semaphores (likely last user) or as root, while no other process is using the database.
This message feels a bit ambiguous — Is the problem the user the process is running as, the fact that there are two processes accessing the database, or both? However, if I am interpreting this correctly, then it implies that two processes cannot access the same lmdb database at the same time on these platforms.
No. If multiprocess access wasn't supported on those platforms we would say it's not supported. The message is quite clear - there may be problems if the semaphore is owned by a different userid. This is because POSIX semaphores are controlled by Unix permission bits, and we shouldn't have to explain to you how the operating system you're using works. If the permission bits are too restrictive, then only processes from the same userid will have access. If all the processes are running as the same userid then there's no issue.
Thank you very much @hyc for your help!
Furthermore, I found this info on http://www.lmdb.tech/doc/index.html (Note: Not sure if this is official documentation):
It is, but it's not necessarily the most recent. The docs are generated by doxygen from comments embedded in the LMDB source, thus you can always find the most accurate docs in your own LMDB source tree. Mainly just by reading lmdb.h.
Makes sense, I will keep this in mind.
On BSD systems or others configured with MDB_USE_POSIX_SEM, startup can fail due to semaphores owned by another userid.
Fix: Open and close the database as the user which owns the semaphores (likely last user) or as root, while no other process is using the database.
This message feels a bit ambiguous — Is the problem the user the process is running as, the fact that there are two processes accessing the database, or both? However, if I am interpreting this correctly, then it implies that two processes cannot access the same lmdb database at the same time on these platforms.
No. If multiprocess access wasn't supported on those platforms we would say it's not supported. The message is quite clear - there may be problems if the semaphore is owned by a different userid. This is because POSIX semaphores are controlled by Unix permission bits, and we shouldn't have to explain to you how the operating system you're using works. If the permission bits are too restrictive, then only processes from the same userid will have access. If all the processes are running as the same userid then there's no issue.
Interesting. In the case of iOS, some people have mentioned that all iOS apps use the same userid (1), and I have tested this (using iOS 17.6.1) and it seems to be the case for us (Both app and extension processes report to be using userid 501
)
I tested @jb55's patch (https://github.com/damus-io/damus/pull/2410) which follows one of your past recommendations on this topic, and it seems to resolve the issue on my end.
I believe @jb55 might be facing another separate issue with similar symptoms. I will work with him to debug whatever issues remain
btw it's not just me, vanessa has the same issues
are you testing with your local dev build or testflight? I find when I'm connected with a debug session the sandbox behavior is completely different. An example of this is the lockfile. not closing the DB works great when debugger is connected, but when I run the app separately iOS will crash the app in the presence of a lockfile.
are you testing with your local dev build or testflight?
In my previous comment I tested with a local dev build. However, I tested with the newest testflight build you pushed, and it also worked on all 3 app states (foreground, background, closed)
Procedure:
Test 1 preconditions:
c1a89bd61722ba56b25b3f22971ba7de1abf29d3
(current tip of release_1.10
)Test 1 results: Notifications display formatted profile names in all steps
Test 2 preconditions:
a20f3ab2
(Latest TestFlight build as of writing)Test 2 results: Notifications display formatted profile names in all steps
@alltheseas do you have issues with push notification formatting?
for context:
me: iphone 14 pro max (18.0) - 380 mb nostrdb vanessa: iphone 15 pro max (17.6.1) - 13gb nostrdb
@alltheseas
I believe this is fixed now
Looks like this is still an issue