bitcoin / bitcoin

Bitcoin Core integration/staging tree
https://bitcoincore.org/en/download
MIT License
78.42k stars 36.16k forks source link

blockchain download speed #8738

Closed Ayms closed 7 years ago

Ayms commented 8 years ago

There must be something wrong somewhere, I read https://github.com/bitcoin-dot-org/bitcoin.org/issues/846 and https://bitcoin.org/en/release/v0.10.0#faster-synchronization and other posts on the subject but it just took bitcoin core on my Windows PC 10 DAYS to get the entire blockchain(!!)

With a ~10 Mbps connection rising usually to 16 Mbps while torrenting, quasi not doing anything else than downloading the blockchain days and nights

How can this be explained? I would have retrieved it much faster from bittorrent

jonasschnelli commented 8 years ago

The bottleneck is not your internet bandwidth.

The slowdown happens because of block verifications causing thousands of operations every second including database (utxo-set) lookups on your disk or in the db memory cache.

Try to use a SSD and/or set your -dbcache to a value around 4000 (depending on how much ram you have available).

Ayms commented 8 years ago

Well, there must be some subtility I am not aware of (and I am trying to see what happens for a normal user, not someone able to use options)

Do you mean that sum(check downloaded block(s) + time to download block(s)) !~ (check blockchain while opening bitcoin core)+ (total time to download all the blocks) ?

check blockchain while opening bittorrent ~ 30/40mn for me

Worst case: blockchain is 100 GB, 10 Mbps --> 22 hours to download with bittorrent (worst case again if we consider this is slower)

Total: < 1 day, this is very far from 10 days

As far as I understand bitcoin core only connects to 8 peers, so the download bandwidth is limited to only the [sum of the upload bandwidth of the 8 peers], knowing that unlike bittorrent your goal is not to use the totality of the upload bandwidth of those peers I suppose

In that conditions I really don't see how you can compete with bittorrent and how the method can be faster

sipa commented 8 years ago

Again: the speed is dominated by how fast you can verify the blocks, not how fast you can download them.

If you'd download them via BitTorrent, you'd still need 10 days to verify them in your setup, in addition to the 1 day to download it. Bitcoin Core verifies simultaneously with downloading, avoiding the need for double storage and needing to wait for the download to complete before starting to validate. That verification can be sped up a lot if you can assign more memory to the database cache as @jonasschelli mentioned.

10 days does sound extremely long though. Can you share some information about your hardware (disk, cpu, memory)?

martin-lizner commented 8 years ago

As far as I remember it took me about 20 hours to download whole blockchain (80GB) on my 4core Skylake CPU. Comparing to Raspberry Pi3, which took like 3 weeks to do that on the same internet line illustrates the dependency on computing power.

But I wonder, if there is anything that Core can do/plan to reduce the full-download time?

sipa commented 8 years ago

We're continuously working on it.

I suggest you try syncing with 0.11 (before secp256k1 validation), or 0.9 (before headers-first sync), or 0.7 (before utxo database).

martin-lizner commented 8 years ago

Yeah, thats right secp256k1 did speed up the thing a lot! Is there still potential for improvement, could you share any ideas?

Ayms commented 8 years ago

@sipa PS : your previous reply was talking about 3 days... Anyway, that's not important, my hw is a normal windaube PC, 5 years old CPU, 3Go

For some research/project purposes I must behave like a normal user, ie I have plenty of means to retrieve the blocklist more efficiently if I like, but here I am trying to act just like a normal user, and again the intent here is not to criticize things but to understand why they are like this and if possible to improve them, 10 days for (only) 80 GB + all checks, that's not possible, something is wrong

Again: the speed is dominated by how fast you can verify the blocks, not how fast you can download them.

You cannot speculate on the user processing capabilities/bandwidth, ie if I have a 100 Mbps bandwidth with high CPU/RAM then in this model the 8 peers scenario remains a very very serious limitation, it's impossible to pretend being faster than bittorrent with this

And if I can download faster than checking the blocks then there are no reasons to block this

Bitcoin Core verifies simultaneously with downloading, avoiding the need for double storage and needing to wait for the download to complete before starting to validate

You can torrent the blockchain and check the blocks simultaneously too, for bitcoin you would need some luck to get contiguous pieces but modern bittorrent clients implement streaming which means that pieces are retrieved somewhere sequentially (not exactly but if some next pieces are required urgently the algorithm will retrieve them first)

For now I don't know why 30mn when you open bitcoin core to reverify the blockchain becomes 3 or 10 days when you download it (??)

Probably it has already been thought in the past but it does not seem unlikely to envision a bittorrent client with bitcoin client where the blockchain is distributed in several trusted/verifiable infohashes (and seeded by both bt/btc) and does not need to be rechecked, and where the DHT can help storing information to unload the blockchain

sipa commented 8 years ago

10 days for (only) 80 GB + all checks, that's not possible, something is wrong

Yes, I agree. We should investigate that. I would very much like to know what hardware you're running on (CPU, memory, disk, software version), and perhaps debug.log files. Perhaps a particular attack exists on the network that managed to interfere with your download process. Perhaps there is a bug in the download logic. But I can't judge that without seeing more information. What I can tell you is that downloading is generally not the issue - validation is, and a torrent won't help.

You cannot speculate on the user processing capabilities/bandwidth, ie if I have a 100 Mbps bandwidth with high CPU/RAM then in this model the 8 peers scenario remains a very very serious limitation, it's impossible to pretend being faster than bittorrent with this

The fastest numbers I know are possible with good connection, lots of memory, fast disk, many fast CPU cores, is around 4 hours for validating the entire chain. That's approximately 47 Mbit/s of block validation. More common on regular hardware is around 10 Mbit/s. Your 100 Mbit/s is not every going to matter - all you need is to be able to satisfy the demand of that 10 Mbit/s. With slow hardware and little memory, it can certainly drop much lower.

For now I don't know why 30mn when you open bitcoin core to reverify the blockchain becomes 3 or 10 days when you download it (??)

What does 30mn mean?

Probably it has already been thought in the past but it does not seem unlikely to envision a bittorrent client with bitcoin client where the blockchain is distributed in several trusted/verifiable infohashes (and seeded by both bt/btc) and does not need to be rechecked, and where the DHT can help storing information to unload the blockchain

We used to maintain a blockchain torrent until 0.9, but it was discontinued in 0.10 for the reasons mentioned. With a torrent you'd have to wait until the majority of the chain arrived, as we can only validate linearly.

Ayms commented 8 years ago

Yes, I agree. We should investigate that. I would very much like to know what hardware you're running on (CPU, memory, disk, software version), and perhaps debug.log files.

Will try to get this but first will relaunch the whole download to be sure (results are not precise here, 10 days might be 8 days since I had to shut down the computer sometimes, how to activate debug.log?)

The fastest numbers I know are possible with good connection, lots of memory, fast disk, many fast CPU cores, is around 4 hours for validating the entire chain. That's approximately 47 Mbit/s of block validation. More common on regular hardware is around 10 Mbit/s. Your 100 Mbit/s is not every going to matter - all you need is to be able to satisfy the demand of that 10 Mbit/s. With slow hardware and little memory, it can certainly drop much lower.

I see two problems with this approach and the 8 connections limitation: you limit the download for those that have the 47 Mbps capacity and it's not very likely that you reach 10 Mbps for normal users if we estimate that the 8 peers upload bandwidth is < 1 Mbps (and usually closer to 500 kbps) knowing again that's it's not the target of btc to staturate the upload bandwidth of the peers, so you are more likely to get something like 2 Mbps (so 4/5 days already... any way to measure this in bitcoin core?)

What does 30mn mean?

30 minutes to check the blockchain when I open bitcoin core, I read your docs but did not find the difference between this check and the checks performed while downloading the blocks (btw can we force bitcoin core to recheck the whole blockchain the way it does when downloading blocks?)

We used to maintain a blockchain torrent until 0.9, but it was discontinued in 0.10 for the reasons mentioned. With a torrent you'd have to wait until the majority of the chain arrived, as we can only validate linearly.

I know, but you maybe missed a part of what I wrote above, you could partially validate blocks not arriving in order, but maybe it would introduce more complexity than benefit, but again bittorrent clients retrieve pieces sequentially, so you don't have to wait for the whole file.

We can call it torrent or it can be derived from btc protocol or whatever, does not matter if we reach the expected result, something built-in doing torrent like job or outside (if you look at https://github.com/Ayms/node-Tor#anonymous-serverless-p2p-inside-browsers---peersm-specs and https://github.com/Ayms/node-Tor#differences-with-bittorrent you have an example to achieve this without mimicing at all the bt protocol), bt looks a good candidate because it can be seeded by non btc peers (that have other things to do than serving files) and is widely deployed, but can be a mix of both, most likely the "bootstrap" blockchain would be divided in several well known "torrents", that can't be faked if we suppose that sha1 is not broken, and/or can be secure by other means than usual bt metadata, then why would it be needed to recheck them? I suppose this has already been studied/discussed

sipa commented 8 years ago

The fastest numbers I know are possible with good connection, lots of memory, fast disk, many fast CPU cores, is around 4 hours for validating the entire chain. That's approximately 47 Mbit/s of block validation. More common on regular hardware is around 10 Mbit/s. Your 100 Mbit/s is not every going to matter - all you need is to be able to satisfy the demand of that 10 Mbit/s. With slow hardware and little memory, it can certainly drop much lower.

I see two problems with this approach and the 8 connections limitation: you limit the download for those that have the 47 Mbps capacity and it's not very likely that you reach 10 Mbps for normal users if we estimate that the 8 peers upload bandwidth is < 1 Mbps (and usually closer to 500 kbps) knowing again that's it's not the target of btc to staturate the upload bandwidth of the peers, so you are more likely to get something like 2 Mbps (so 4/5 days already... any way to measure this in bitcoin core?)

Your making assumptions. Please. This has been tested over and over again, and no, the downloading is not a bottleneck in normal circumstances. If you have evidence for the opposite, I'd very much like to see it, but we can't keep arguing based on hypotheses.

Put the contents of your debug.log file somewhere where people can have a look (it's found in your data directory, and a google search will give you more information).

What does 30mn mean?

30 minutes to check the blockchain when I open bitcoin core, I read your docs but did not find the difference between this check and the checks performed while downloading the blocks

Ah, no, that is just a consistency check to see whether the database is in a valid state. It does not validate the blocks again.

We used to maintain a blockchain torrent until 0.9, but it was discontinued in 0.10 for the reasons mentioned. With a torrent you'd have to wait until the majority of the chain arrived, as we can only validate linearly.

I know, but you maybe missed a part of what I wrote above, you could partially validate blocks not arriving in order, but maybe it would introduce more complexity than benefit, but again bittorrent clients retrieve pieces sequentially, so you don't have to wait for the whole file.

We already do all validation possible at the time blocks arrive, and we do download blocks out of order. However, we do not download blocks more than 1000 (approximately one week worth) ahead of the latest fully validated one, to make sure we don't stall full validation for too long. Furthermore, we automatically select the best peers by continuously kicking off the slower ones. This mechanism seems to work very well in practice.

The expensive part of validation can only be done when you have the unspent outputs the transactions in that block depend on. Those are kept in a database called the chainstate, and it depends on all previous blocks. It's a reasonable suggestion to say we should do validation as soon as possible, but please research how things work first.

To summarize:

Ayms commented 8 years ago

"how things work first" I know and "how things work already" I don't, your docs, while good, do not describe everything, but indeed I will figure it out myself, I really don't see the advantage of reindexing everything when you first get/download the blockchain compared to downloading bootstrap state or intermediary ones and compute the delta (made a mistake and forgot to backup the chainstate before reindexing, have to wait for days now a priori...)

"Your making assumptions" + "the downloading is not a bottleneck in normal circumstances" + "This has been tested over and over again" + "Furthermore, we automatically select the best peers by continuously kicking off the slower ones": we will see, for me the 8 peers theory + kicking off mechanism is not enough (and maybe even dangerous) for now and the future, at least for the bootstrap/sync phase

Launched a reindex with debug activated but despite of extensive googling I can't find debug.log, neither in data dir or appdata (installed bitcoin on a separate drive) all I have is a debug.txt that does not seem to say a lot

maflcko commented 8 years ago

You can find it in your data dir.

afk11 commented 8 years ago

@Ayms mentioned earlier:

"Anyway, that's not important, my hw is a normal windaube PC, 5 years old CPU, 3Go"

I wouldn't expect it to operate as fast as stated given this hardware. It might be better to have specific models from you, but I think I know where that will go (it's too old, & is that 3Gb of RAM?)

I used to have some luck avoiding the initial checks with -checkblocks, but I see the default setting has changed since I used it last (used to do 240, it's now 6). -checklevel might also help.

Maybe mention what version version you're running? If it's not a signed release, what git revision?

Ayms commented 7 years ago

I sent my conf to @jonasschnelli and @sipa , @afk11 the version is 0.13.0 downloaded from bitcoin.org

@MarcoFalke thanks I saw this link and tried different things but definitely still no signs of debug.log, even worse debug.txt looks frozen now

Probably unlucky or not doing things correctly, I launched a reindex, it destroyed the chainstate, after 3 hours bitcoin core told me it was finished but nothing in the chainstate dir, relaunched bitcoin core and it started to reindex from the begining...

rebroad commented 7 years ago

@Ayms Please can you post a link to your debug.log (e.g. a pastbin URL) so that people can look at it if they want to help identify the cause of delay. For it to be useful you'll have needed to have had debug=net in your bitcoin.conf file though.

I am of the opinion that it would make sense to make block download without validation an option if you trust the source of the data you are downloading it from - e.g. another machine you own and trust.

Ayms commented 7 years ago

Is debug=net enough? (reindexing right now since Sathurday, will test the blocks after) Apparently debug.log is debug.txt for me

Another thing: I modified one bit of one block last week and it was not detected when opening bitcoin core, normal?

I am of the opinion that it would make sense to make block download without validation an option if you trust the source of the data you are downloading it from - e.g. another machine you own and trust.

What is the big issue with envisioning to torrent the blocks (+ state)?

Poseidonn77 commented 7 years ago

@Ayms Did you set the amount of threads on BTC-core to your amount of CPU core's?

laanwj commented 7 years ago

Another thing: I modified one bit of one block last week and it was not detected when opening bitcoin core, normal?

Of a random block, on disk? Yes, small chance that is detected, unless a) you request the block using getblock b) a peer requests it. It's not like bitcoin core reads and checks all gazillion blocks at startup, now that would be slow...

rebroad commented 7 years ago

@Ayms I would hope it would get detected during a reindex, but not during a normal startup as to check every block on disk requries a reindex (or rescan?). There have been proposals in the past about making 'torrent-like' block download available, among various proposals. I think a feature to download headers in reverse would be worthwhile adding at some point. Not sure of the best place to suggest to join in the discussion on this - here is a good (if not better) as any, IMHO.

Ayms commented 7 years ago

Was waiting for the current reindex to finish but running a little bit out of patience, started last saturday, should finish today, so more than 8 days...

So indeed there is no point to check now anything related to potential bandwidth issues

I sent my conf + debug.log, don't feel it's so obsolete, or there is something really really wrong with my test computer but don't see it

There have been proposals in the past about making 'torrent-like' block download available, among various proposals

Which ones?

I think a feature to download headers in reverse would be worthwhile adding at some point

Indeed, worthwhile looks to be a weak word, it is required, we can't wait for more than a week to retrieve something that can be retrieved in less than a day

Not sure now what could be the best solution but maybe something like incremental torrents containing the blocks + chainstate (would this be feasible?), built every two weeks (ie one torrent every two weeks) and distributed by the bitcoin peers who would implement a torrent-like protocol then, maybe compatible with bittorrent so other peers can seed them too

If the blocks size does not change this still works in 200 years, if the blocks size increases then it's another story

Or something like you suggest

Maybe this can be combined with the bittorrent DHT, saw also some discussions about it but some comments I read about insecure and counter performing DHT are for me completely wrong, or just a shortcut trying to eliminate the potential of DHT based systems, of course the bittorrent DHT is everything but secured, but it just depends on how you use it (example: torrent-live), not uninteresting for example ("incremental"/"mutable"): https://torrentfreak.com/mutable-torrents-proposal-makes-bittorrent-resilient-160813/

Not sure of the best place to suggest to join in the discussion on this - here is a good (if not better) as any,

If there is an interest we can probably discuss it somewhere

rebroad commented 7 years ago

@jonasschnelli Can you please elaborate on what these verifications are you mentioned? Surely it should be enough to verify the the block hashes are correct - which is a fairly quick operation. (If working in reverse, I mean).

sipa commented 7 years ago

@rebroad Full validation needs a lot more than checking hashes. It also includes finding all inputs of a block's transactions in the chainstate database, executing all scripts, and writing an updated chainstate with removed spent UTXOs and added created UTXOs back. But as explained to you several times now by @laamwj, this is not the place for basic question's about a node's operations.

rebroad commented 7 years ago

@sipa yes, I know this, but we are exploring a "torrent-like" alternative way to download the blockchain - how is it possible to download a 12GB movie without corruptions of the data, but not 12GB of blockchain in the same amount of time and CPU usage? Surely if someone has the checksum of the data, and trusts this checksum, then this should be all that is required.

To put this in perspective, when downloading Bitcoin-QT, you expect downloaders to trust the checksum which has been signed by various developers. In the same way, a checksum should be all that is required to download the blockchain also. No further validation should be needed, based on the same level of trust.

I am proposing that the hash of the latest block can be that checksum for the entire blockchain, and that if enough trusted sources agree on that hash, then it's overkill to do more. This is afterall, the same degree of security being used for the propagation of the bitcoin-qt binaries.

And please, less of the "this is not the place" comments. This is the place (until there's a better place).

sipa commented 7 years ago

The point of a full node is that you do not have any hash to trust. If you do, you can run a lightweight client instead.

rebroad commented 7 years ago

@sipa I mean no disrepect, but what you suggest seems irrelevant to the points I am making. People want to help by running full-nodes. Allow them the option to trust some other nodes (or the developers of Bitcoin Core, as you already are doing) in order to get up to speed quicker. Your argument is arguing against compiled binaries being used, in effect, and is saying that everyone should go over every line of code before running it.

i.e. you are being inconsistent, applying one standard to the code, and another to the blockchain.

sipa commented 7 years ago

@rebroad The code is open source, you can verify (through Gitian) that the downloadable binaries match the source code, and you can in fact check line of code if you wanted to (or rely on others who do the same).

Downloading the chainstate from someone else however means you are not verifying anything anymore, and you're even unable to do so at all, without actually going back and rebuilding everything.

There have been various models proposed where miners commit to the resulting chainstate at every block, or every so many blocks, in a way that permits bootstrapping faster by trusting these miners up to some point in history, and then fully validating everything after it. These have however not been implemented, as it may be controversial due to the high extra costs for validation it would bring (every full node would need to check that the commitment matches reality). And even if there were, this is still a big change in security.

rebroad commented 7 years ago

@sipa I do not think you are expecting everyone to check every line of code. I think, given the way bitcoin binaries are distributed, that you are expecting people to trust the people who have signed the gitian compiled code. Therefore given trust is already being expected, we're just talking about extending it to another area that is also exploitable (by those trusted). i.e. anything trusted could be used to steal bitcoins, so the security is not weakened in any way, given that the trust is already being given. People can still chose whether or not to trust - trust should be optional. But it's already an option for the code - make it an option for the blockchain too.

There are already many places where people can check the latest block height, hash, etc. Perhaps even the outbound connections the average node makes (once addrman is fixed) would be sufficient validation to be sure that the correct checksum/hash is obtained - the user could be invited to make a manual further check - perhaps checking one or more of the numerous websites out there that publishes the latest block height and hash. I think most full-node operators would be comfortable doing this.

sipa commented 7 years ago

The point is that you can verify lines of code, and have the ability to check gitian signatures. Certainly not everyone reads through every line of code, but jointly people would discover malicious changes.

If you set up your node with just a blindly trusted chainstate, you cannot in any way verify its correctness anymore. And if you're fine with running a full node on blind trust, I think you're completely missing the point of having a full node in the first place.

In any case, I'm done arguing this. There are a lot of philosophical discussions to be had about this, but this issue is not the right place for it.

rebroad commented 7 years ago

@sipa Yes, and with my proposed solution, you will still have the option to verify the blockchain (I expressly stated that trust should be optional) - you don't appear to be reading half of what I am writing.

sipa commented 7 years ago

@rebroad I am incredibly annoyed with your responses here.

You asked @jonasschnelli what verification he was talking about, claiming that it should just be some hashes being verified. I clarified that block validation is much more than just checking hashes, which lead us on a tangent where you start elaborating that people trust the code anyway so it doesn't matter. Of course, if you're not going to validate, then whatever Jonas was talking about does not matter anymore, right? In particular:

Perhaps even the outbound connections the average node makes (once addrman is fixed) would be sufficient validation to be sure that the correct checksum/hash is obtained

makes me think you shouldn't touch Bitcoin with a 10-foot pole. Your suggestion would allow anyone who can sybil attack you (your ISP? state level entities? anyone with enough resources to mount an eclipse attack?) to print money at will, doublespend at will, and pretend to have paid you any amount.

Elsewhere you stated it should be optional, and operators could check the end result on a website. I expect this would result in a world where in practice nearly everyone would trust the same few websites, creating a huge systemic risk.

Maybe you were talking about the idea of not verifying blocks coming from trusted nodes? I can't follow anymore. In any case, that doesn't actually gain you much, as you'd still need to build the chainstate (which is a large part of the sync time on modern CPUs), even if you don't validate the scripts in it.

Please, use appropriate places and tone for discussions. A github issue about torrent-like downloading is not the right place to discuss fundamentally changing Bitcoin's security model - you only reach a few developers that way. And I think there are much more inviting ways to ask for clarification about this issue than "Surely it should be enough to verify the the block hashes are correct".

maflcko commented 7 years ago

I think this issue evolved into a quite thorough description of bitcoins trust model. Please keep in mind that issues are meant for tracking problems with the code base of Bitcoin Core and meta discussion should happen in other places. Please also note that a suggestion was already posted to the mailing list several months ago (and discussed multiple times in different bitcoin related forums).

Due to the length and content of this issue it is no longer possible to use it productively to improve Bitcoin Core and I feel that people's time is wasted by leaving it open. Thus, I will close the issue.

rebroad commented 7 years ago

@MarcoFalke Thanks for the link to the mailing list. Although, I'm not sure from that read-only webpage, how I'm supposed to contribute to that discussion, so for now it will have to be on here until a read-writable alternative is proposed. I do not consider this conversation to have gone out of scope, so am confused why this issue has been closed as I feel headway is being made with regards to finding a solution to this issue.

@sipa I am unfamiliar with how the chainstate and UTXO would be created following the download of the blockchain - I am guessing that these would need to be reconstructed before the wallet functionality and ability to validate TXs became available, yes? Is there any way these could be shared between nodes in a way that reduced the work needed to construct them? Is there any documentation I can read that explains the structure/format of the chainstate and UTXO please?

laanwj commented 7 years ago

Although, I'm not sure from that read-only webpage, how I'm supposed to contribute to that discussion

Don't pretend to be stupid, sIgn-up instructions are also readily available: https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Locking this issue as people just won't listen.