Closed lurais closed 1 year ago
I have no context. Don't know what you are talking about. About known problem of lmdb may read a bit here: https://github.com/erthink/libmdbx/blob/bb8f43181783686879219846d64379a04c1430e3/mdbx.h#L1197
This is the detail:https://github.com/ledgerwatch/erigon/wiki/LMDB-freelist-illustrated-guide, is this solved in go but not in c code? I am not sure how to make it appear again , is there any detail about how to make it appear again as I want to make sure that other versions of lmdb does not have this problem?
It needs to be solved in C code (and lmdb-go probably does have our patch). But it's not 100% solved - many corner-cases can cause freelist to grow or slowdown. so, it's not really solved in lmdb (in mdbx is a bit better, but still up to some dbsize/deletes_amount/use-case/etc...). better solution - increase pageSize - less pages less maintenance cost of freelist.
So, why don't you figure out it and push lmdb to solve it in C code? Is this problem can not be solved completely in any cases? I wonder how can I make it appear again, Is that just talked here :https://www.mail-archive.com/openldap-bugs@openldap.org/msg03806.html
- We did patch C code. But for our use-case.
- It’s not really solvable for all use-cases, especially without breaking freelist format.
- Main problem is: 3.1. recursive dependency o self - to update freelist need update freelist 3.2. Freelist itself is big and may be evicted from pagecache. 3.3. Hard to deal with pages fragmentation.
- https://www.mail-archive.com/openldap-bugs@openldap.org/msg03806.html yes it’s related
Hello, how is the C code pr now? does it merged? And does lmdb intend to solve it in all use-cases or they think it is not a problem for it will appear in very little use cases?
We didn't create PR to upstream, because it's not easy. You can find patch in Commits on Nov 10, 2020
: https://github.com/ledgerwatch/lmdb-go/commits/master/lmdb/mdb.c
It’s not really solvable for all use-cases, especially without breaking freelist format. Our use-case is DB >> RAM which is rare and we using crypto-hashed keys sometime which is also rare (updating much randomly-distributed pages). Also issue can be partially-mitigated by avoiding OverfolwPages creation: by using values < 2Kb.
You mean that the DB size is much more bigger than RAM so that the pages can not hold all the data, so the data access will need to read or write the hard disk more times?
there are many-many different problems:
Thanks for your attention but I test that when use a lmdb to test the performance, it cost much more time to read some keys when the size of mdb is bigger than the phisical memory size ,have you meet this problem? for example ,when the physical memory size is 2G and it cost much more time to read some keys when the storage in mdb is more than 2G.
lmdb - it's mmap file. can read any article about "what is page fault in mmap": https://biriukov.dev/docs/page-cache/5-more-about-mmap-file-access/#what-is-a-page-fault
lmdb - it's mmap file. can read any article about "what is page fault in mmap": https://biriukov.dev/docs/page-cache/5-more-about-mmap-file-access/#what-is-a-page-fault
as I know , mdb is a branch of lmdb, so ,how do you avoid this problem?
no silver bulet
no silver bulet
As I see, if there is no too much data store in lmdb ,the query performance is better than leveldb, so you just use little storage space or just think it not matter?
if in your use-case Data << RAM - then probably everything will be "fast enough". Most of edge-case problems will not happen in this use-case. Main problem is - you don't know much about use-case (is it OLAP or OLTP or HighlyParallel-writes or TimeSeries or ... ?).
FYI: LevelDB doesn't support ACID transactions. Geth switched from LevelDB to PebbleDB - it also doesn't support transactions.
Let me quote mdbx maintainer:
libmdbx is [B-tree](https://en.wikipedia.org/wiki/B-tree) based, with [ACID](https://en.wikipedia.org/wiki/ACID) via [MVCC](https://en.wikipedia.org/wiki/Multiversion_concurrency_control)+[COW](https://en.wikipedia.org/wiki/Copy-on-write)+[Shadow paging](https://en.wikipedia.org/wiki/Shadow_paging), without [WAL](https://en.wikipedia.org/wiki/Write-ahead_logging), but mostly [non-blocking](https://en.wikipedia.org/wiki/Non-blocking_algorithm) for readers etc.
RockDB is [LSM](https://en.wikipedia.org/wiki/Log-structured_merge-tree) based, with set of features, including compression...
So, ones are both storage engines, but explaining all the differences is only a little easier than explaining how the universe works.
In fact, if we don’t know the type of use case, can we determine it by event tracking, such as recording all read and write frequencies, as far as I know, the eth use case should be relatively certain, right? In addition, the main reason to consider using mdbx is to support ACID? So compared to geth, where is the necessity of ACID for erigon?
"the eth use case should be relatively certain, right? " - nope. save 1 new block every 5 seconds is not a problem at all. but then you need exec it, need store MerkleTrie, need store inverted index for tx-hashes, save history of state changes and inverted index about it, etc... erigon also has dedicated indices for ethgetLogs and trace* methods. And also it need serve ~100 different RPC methods.
Humanity using transactions - to save itself from "database is in broken state" issues - it's "data integrity". Erigon has > 50 tables in db. Also it has > 5 databases. Just read some articles/books about databases theory: transactions, isolation levels, durability, etc...
So, compared with geth, is it more important for erigon to guarantee acid? As far as I know, geth does not support this kind of acid , but there are not many cases of database corruption. If data corruption occurs, the data can also be recovered through fast synchronization, so is it worth spending these costs to support acid?
in erigon we decided - yes.
in erigon we decided - yes.
for the data organization is different from geth?
in erigon we decided - yes.
for the data organization is different from geth?
Because I don't wan't to spend my time on debugging all this classical "transfer 1 eth from account A to account B. After deduction from account A - app crushed. In result 1 eth lost - nobody have it - and nobody know that something is wrong - app just continue working without any warnings/errors/etc..." issues.
If data corruption occurs, the data can also be recovered through fast synchronization
- it's false:
This question is very old and has tons of articles in internet - please google. In this comment I described only "Atomicity /Consistency" topics. But there is also "Durability". And there is also "Isolation" - users of RPC will see partial-commit results (invalid data) - and it will gone on the next RPC call (fist RPC call will show lost 1 eth, next will show 1 eth on account B) - and they will give all this unreproducible bugs to you. Enjoy.
You don't need transactions when your app has insert-only 1 table work-load. But ETH client need handle re-orgs, and for 1 new block write: blocks, receipts, state, various indices/mappings, codes of smart-contracts, merkle trie, etc...
Hello , I am very interested in the problems you discribed before, and I want to know how I can find it appear again. for I want to confirm that these problems is still exists in other versions of lmdb.