bitcoinjs / indexd

An external bitcoind index management service module
ISC License
53 stars 23 forks source link

Add independent txindex #6

Open dcousens opened 7 years ago

dcousens commented 7 years ago

This would mean we could run indexd on a pruning node, provided it is fully synchronized to start.

dcousens commented 7 years ago

The three options as I see them:

dcousens commented 7 years ago

If we maintain our own txindex, not only can we bundle it with https://github.com/bitcoinjs/indexd/issues/9 - but we can preserve the entire transaction for other analysis.

@runn1ng the concern here is, this could severely blow out a disk in terms of space required... Maybe optional?

karelbilek commented 7 years ago

What would be the motivation of having at the same time pruning node and txindex? When user is using pruning, he wants to save disk space, which is then negated by saving txindex :)

Also what is the reasoning of having separate index here instead of relying on bitcoind?

karelbilek commented 7 years ago

Btw, getting back to bitcore fork (I am going back to it, because it does what I need :), but it's a bit painful to maintain because of the rather large patchset)

If you have addressindex, spentindex, timestampindex and txindex enabled, the disk space significantly grows, I think about 2 or 3 times from blockchain without the indexes

unsystemizer commented 7 years ago

addrindex and txindex don't add nearly as much. If the exact figures are important I can look them up. A motivation of having indexes with a pruned blockchain would be you could fetch tx details later (not necessarily from the same bitcoind) no?

dcousens commented 7 years ago

@unsystemizer exactly

instagibbs commented 7 years ago

@runn1ng I don't have any special info but txindex may someday get retired from bitcoind, especially if external indexes like this are successful. Core contributors in general are quite down on additional indexes due to complexity and interactions with consensus code.

unsystemizer commented 6 years ago

Just did a repeat of the same experiment somebody did with addrindex before: a) Create txindex on testnet (mainnet is larger, so ...) b) Use the max dbcache value (16GB) on bitcoind

root@indexd:~/.bitcoin/testnet3/blocks/index# du -sh
949M .
...
root@indexd:~/.bitcoin# tail -f /root/.bitcoin/testnet3/debug.log
...
2017-10-28 13:41:29 Cache configuration:
2017-10-28 13:41:29 * Using 1024.0MiB for block index database
2017-10-28 13:41:29 * Using 8.0MiB for chain state database
2017-10-28 13:41:29 * Using 15352.0MiB for in-memory UTXO set

Conclusions:

It'd be valuable to be able to either load txindex in RAM (mainnet) or avoid wasting many GBs of RAM on caching in-memory UTXOs (testnet) in order to be able to fully cache txindex (edit: in indexd, of course)

KanoczTomas commented 6 years ago

@unsystemizer I have a feeling you are mixing up txindex and chainstate. The UTXO set in bitcoind is in chainstate dir, while the txindex is in blocks/index. The chainstate is currently 2.8G for mainnet, a full index is 14G. edit: of course I could be wrong ... that is my understanding from chat with the devs.

The dbcache switch is only used for the chainstate db in bitcoind. Just wanted to make sure you have the correct asumptions, not sure what you were trying to calculate. 4G dbcache is effectively an infinite space while syncing as the utxos set never raches it (core devs use it as the max in benchmarks).

unsystemizer commented 6 years ago

What would be the motivation of having at the same time pruning node and txindex?

@runn1ng exactly that. It allows me to keep my indexes on a fast node (while running bitcoind on a slow node) and at the same time doesn't require bitcoind admin to worry about index maintenance. There are several other reasons, some of which I mentioned 2 comments above. Regarding your point on txidnex size from the other comment, I checked on both testnet and mainnet, and currently txindex occupies (approximately) 10% of block capacity (on a non-pruned node). If both addrindex and txindex are enabled (using Bitcoin Core with addrindex patch), they take (approximately) 25% of block data capacity (addrindex 15% and txindex 10%, roughly speaking).

dcousens commented 6 years ago

@unsystemizer the issue with a local transaction index, is that we can't index into the .dat files themselves, as they may be in an indeterminate state (as bitcoind updates them).

We would have to maintain nearly the entire blockchain in our database, as the block headers only account for 80 bytes...

Hence, why I suggest that sane users will think they should prune... which could make any "catch up"/"resync" phase difficult if the data is missing.

I agree that you could use an external node for initial resync, then the local pruning node after that.

dcousens commented 6 years ago

Another alternative is if we could ask network peers for the blocks on the initial sync... then continue as normal with our pruned node. We don't want to ask random peers directly, as we don't want indexd to have to verify consensus rules.

If the block was verified by bitcoind on the way... that'd be near perfect.

Maybe a new RPC call for pruned nodes? fetchblock, with a condition the block has to be on the best chain. For non-pruned nodes, it is an alias to getblock.

@theuni thoughts, could this be possible?

This would allow us to resync, using a pruned node, and therefore drop our dependency on -txindex by maintaing our own.

The option to fast synchronize via something like fast-dat-parser could still be done, as that is an offline-step to initialize the local database, and is more of an overall deployment consideration.

dcousens commented 6 years ago

In the mean time, indexd could use the pruneblockchain RPC command to signal where it is up to -prune=1 (aka, manual RPC pruning only), then we could allow indexd to signal when it is safe to prune.

This wouldn't help if the database is lost, but, it would stop there being too much data duplication.

theuni commented 6 years ago

@dcousens If I'm understanding your question, I think https://github.com/bitcoin/bitcoin/pull/10794 would do what you want?

dcousens commented 6 years ago

@theuni yes it would. Thanks for pointing that issue out.