Open cjcobb23 opened 4 years ago
A couple of other ideas that occur to me:
update()
holds the lock during the entire update process. This could have performance implications, obviously, so it remains to be determined if any trade off is worth it. It seems a very safe option for data integrity, though.addOrderBook()
signals to update()
that it made changes while update()
was working (via a flag or something). update()
then aborts and starts over with the latest ledger, thus including the new data. This risks update()
never really finishing on a heavily used node, but it remains to be determined how likely that is to happen.I make no claim that either of these are good ideas. Just throwing them out for consideration.
* `update()` holds the lock during the entire update process.
This would be great for data integrity, but on my machine, if I start the software with --load (meaning I load a full ledger from the database, instead of starting from genesis until network sync), update takes upwards of 5 minutes, which is much too long to hold the lock for.
@gregtatcam could you take a look at this?
Issue Description
There is a rare race condition within
OrderBookDB
, that results in missing data, betweenOrderBookDB::update()
andOrderBookDB::addOrderBook()
.OrderBookDB::update()
processes an entire ledger and then overwrites the underlying datastructures via swap: https://github.com/ripple/rippled/blob/97712107b71a8e2089d2e3fcef9ebf5362951110/src/ripple/app/ledger/OrderBookDB.cpp#L148.OrderBookDB::update()
only holds a lock when performing the swap.OrderBookDB::addOrderBook()
writes to these same underlying datastructures.update()
could take a long time, as it processes the entire ledger, whereasOrderBookDB::addOrderBook()
can execute much more quickly.The race condition is as follows:
OrderBookDB::update()
is called, with a ledger with sequencei
OrderBookDB::addOrderBook()
is called, while processing a transaction from ledger with ledger sequencej
, such thatj
>i
OrderBookDB::addOrderBook()
returnsOrderBookDB::update()
executes the swap operation and returnsThis above sequence leads to the
OrderBook
added byOrderBookDB::addOrderBook()
being discarded.It should be noted that
OrderBookDB::update()
is called when the software starts, and is only called subsequently if there is a gap of 100 or more ledgers in the published ledger stream: https://github.com/ripple/rippled/blob/97712107b71a8e2089d2e3fcef9ebf5362951110/src/ripple/app/ledger/impl/LedgerMaster.cpp#L1235 Therefore, this situation is very rare, and if it does occur, only results in some missing data. It has no ability to cause a crash and is not a security vulnerability.Possible Solution
OrderBookDB::update()
needs someway to know about theBook
s that were added byOrderBookDB::addOrderBook()
whileOrderBookDB::update()
was running.OrderBookDB::update()
could then merge theseBook
s after performing the swap.