bitpay / insight-api

The bitcoin blockchain API powering Insight
https://github.com/bitpay/insight
590 stars 1.04k forks source link

'Bitcoin JSON-RPC: Work queue depth exceeded. Code:429' #492

Open levino opened 8 years ago

levino commented 8 years ago

Looks like this happens when many requests arrive in parallel. I would say this needs to be fixed asap. It kind of renders the api useless as it is very easy to make dos attack on it...

As the current implementation is passing each query on to the bitcoind rpc interface this is exactly what should happen. All queries must be queued globally in order to not overload the bitcoind.

levino commented 8 years ago

Just do 200 queries in parallel.

braydonf commented 8 years ago

Running multiple nodes, likely internal/private, is the means to make it handle more requests.

levino commented 8 years ago

@braydonf Where do you want to continue this?

levino commented 8 years ago

My opinion: It is acceptable for queries to become slow if a lot of queries arrive at the same time, but they should not return errors.

Possible solution: Queue ALL queries to bitcoind on a global level for the running node process. I have implemented a queuing solution with async here, which could be reused either in bitcore-node or in the insight-api. Doesn't queue on the global level though.

levino commented 8 years ago

So I wrote a script / test that just makes the insight api inaccessible for everyone else. I am reluctant to share it. One just starts it and keeps it running and the api is permanently down. I am happy to share the script privately. I think this issue should be addressed asap.

slush0 commented 8 years ago

I second this. Global queue might solve the problem easily.

With current bitcore architecture, even a single user of myTREZOR can DoS whole server just by regular usage. This was not an issue in previous architecture, still used by production myTREZOR app, which serves blockhain for thousands users.

elichai commented 8 years ago

I have this error too. I get the Work queue depth exceeded Error a lot of times. Are there any solutions?

levino commented 8 years ago

No. It is not an error, it is so to say what should be expected from the new set up. The api just does not scale in the current version. Try to "come back later" or use your own private servers with the api running.

levino commented 8 years ago

Global queue might solve the problem easily.

I do not think so. It should be very easy to flood the global queue. One should additionally introduce rate limitation on ip basis or so. Maybe a "queue per ip" could work, where one takes on query from each queue in a circular second queue... This is a clusterf**k because bitcoind just does not scale.

elichai commented 8 years ago

I use it on my own server running with bitcore-node.

slush0 commented 8 years ago

Is performance of newer bitcore-node (using bitcoind over RPC) expected to be much worse than original architecture? If so, are there any plans to keep original bitcore-node maintained (patch for 0.13, ...)?

You're saying that bitcore does not scale well. I cannot second this; we do not have any performance issues so far. The only problem seems to be artifical throttle in form of RPC API in between bitcore and bitcoind.

braydonf commented 8 years ago

@slush0 For clarity, the performance in the latest version (using indexes in bitcoind) is many times better than previous. The reads to leveldb match that from the leveldb benchmarks. Queries for txids, utxos and balances for addresses should all be improved, as well as the performance while reindexing. The typical bottleneck now is disk speed and LevelDB block compaction.

I would take a look at adjusting the configuration options regarding the work queue limit, timeout and threads available for RPC. And to monitor the blocks/index/LOG for activity.

rpcworkqueue=<n>
rpcthreads=<n>
rpctimeout=<n>
levino commented 8 years ago

Interesting. What are the defaults on these values? What are sane values for these? I cannot find consistent documentation of these settings.

About the performance / stability: I challenge you to name me one deployment I can not DOS with zero effort @braydonf

ghost commented 7 years ago

I'm also having the same issue while receiving a new block and querying the api, not while reindexing. Any solutions yet? @braydonf @Levino

technyne commented 7 years ago

I'm having the same issues, new installation, can't even get past 3% sync to the network before depth is exceeded. I am going to try implementing some of the changes suggested here and elsewhere but it seems that no one from the dev team is responding to these requests anymore...

rachyandco commented 7 years ago

@Levino you can find the default values in httpserver.h

static const int DEFAULT_HTTP_THREADS=4; static const int DEFAULT_HTTP_WORKQUEUE=16; static const int DEFAULT_HTTP_SERVER_TIMEOUT=30;

karelbilek commented 7 years ago

see PR above

levino commented 6 years ago

As I said in https://github.com/bitpay/bitcoind-rpc/pull/23#issuecomment-357864919 I meanwhile doubt that a simple queue will do anything to help. We need spam protection, so basically a queue per user (for example IP address) with a maximum queue size which returns error 429 (too many requests) when the queue size for this user has been reached.

Of course the mapping of ip address to user is difficult when load balancing with reverse proxies is used and IP addresses are poorly forwarded. But if that is done correctly, the solution would help to get from a dos vulnerability to a Ddos vulnerability.

A global queue will still be needed on top of this spam protection in order to prevent errors for the normal users so the suggested solution in https://github.com/bitpay/bitcoind-rpc/pull/23#issuecomment-357864919 might still be useful.

levino commented 6 years ago

Possible solution:

The second solution is rather a configuration / deployment task which could be described in a "best practice" document.

I would prefer this much over a built-in spam protection in the insight-api which feels to be mixing concerns at the wrong point in the stack.

winteraz commented 6 years ago

This happens with no request received as well. I just started the service and during the index I've got the same error even if I had all the inbond ports to the API blocked.

charleslcso commented 6 years ago

Dudes, a new install here. Sync'ed to 8% and got this error.... 3 times 24hours of sync'ing... not sure if it's gone.

I guess it's no solution in sight.

BTW, traefik link above seems to be gone... not that I can understand it quickly...

levino commented 6 years ago

Traefik 1.5 has been released. Fixed the link. Has not too much to do with the problem you encounter: You cannot rate limit the requests during sync. Did you try rpcworkqueue=512 in your bitcoin.conf?

charleslcso commented 6 years ago

I have no idea what to try.... ha Ha Ha. Thx, will test with your recommendation.

Sent from my iPhone 9S+

On 7 Feb 2018, at 3:55 PM, Levin Keller notifications@github.com wrote:

Traefik 1.5 has been released. Fixed the link. Has not too much to do with the problem you encounter: You cannot rate limit the requests during sync. Did you try rpcworkqueue=512 in your bitcoin.conf?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.