bitwiseworks / libcx

kLIBC Extension Library
GNU Lesser General Public License v2.1
11 stars 1 forks source link

RPM using LIBCx-enabled DB4 (BDB) w/o DB_PRIVATE fails #32

Open dmik opened 7 years ago

dmik commented 7 years ago

The BDB implementation originally uses shared mmap regions but this doesn't currently work, even with #29 fixed. The original BDB ticket is this: http://trac.netlabs.org/ports/ticket/7. When DB_PRIVATE is removed from RPM (see the original ticket), RPM fails with this when doing e.g. yum install lemon

error: rpmdb: read: 0x7c5c6140, 16384: Invalid argument
error: db4 error(22) from dbcursor->c_get: Invalid argument

Note that it happens even before YUM asks you to Install Yes/No.

dmik commented 7 years ago

One interesting observation. When RPM is compiled with DB_PRIVATE hacks removed (see http://trac.netlabs.org/ports/ticket/7#comment:8), it mmaps many additional DB files which don't normally exist. Their names look like /var/lib/rpm/__db.001, /var/lib/rpm/__db.002 and so on. And these mappings don't go through the __os_mapfile BDB call. There is also __os_attach there that uses mmap so it must be it (__os_attach doesn't use mmap in DB_PRIVATE mode, apparently).

dmik commented 7 years ago

I debugged it a bit. Removing DB_PRIVATE hacks from RPM causes __os_attach to be called in mmap mode, with DB_PRIVATE __os_attach isn't called at all. And the shmget mode in __os_attach seems to be not involved at all on OS/2 (despite that it's implemented and should work). I have no idea what it all means so far. A deeper look at BDB/RPM code is required.

dmik commented 7 years ago

I tried to remove DB_PRIVATE hacks from RPM and convert all MAP_SHARED to MAP_PRIVATE calls in LIBCx and this also fixes the problem. Strange. The logic of mmap in LIBCx in case of MAP_PRIVATE is the same as with MAP_SHARED, the only difference is the DosAllocMem call. This may be a hint...

dmik commented 7 years ago

Well, except that for shared mappings we track writes to pages and flush these writes back to the underlying file. Apparently this part in LIBCx doesn't work right and causes BDB to freak out.

dmik commented 7 years ago

I found out that just disabling the code resetting/restoring the PAG_WRITE bit used to track dirty pages solves the BDB failure. This really puzzles me as modifying PAG_WRITE should not make any harm to the data...