AustEcon / bitsv

BitSV: Bitcoin made easy. Documentation:
https://AustEcon.github.io/bitsv
MIT License
97 stars 28 forks source link

Retry if the http return is incorrect. #19

Closed carpemer closed 5 years ago

carpemer commented 5 years ago

Sometime, the bitindex will return 502. Add a retry function to make sure the code will not stop if this happens.

e.g: Expecting value: line 1 column 1 (char 0), Retrying in 1 seconds...

ghost commented 5 years ago

Hmm, this is a bit different. Usually in bit/bitcash/etc, we've just had different APIs to try. Could we just add another API?

If we're getting that json parsing error, there's probably a spot where we are calling r.json() without r.raise_for_status() or checking the status code manually.

I like the idea, but this is a fairly notable change so I'm a little hesitant. If we do this here, I think maybe bit and bitcash should get it as well. I like keeping the code bases sort of similar as they have a ton in common.

carpemer commented 5 years ago

Sorry, What you mean "different API"?

Even r.json() raise a json parson error, in general, it is caused by a 502 code. It supposes to be a transition error and should be able to resolve by the retry (catch Exception) here.

What is bit and bitcash? Is that for BSV?

ghost commented 5 years ago

So we can try hitting the same API over and over, or we can have different APIs that we try in series. You can see this in bitcash and bit.

https://github.com/ofek/bit https://github.com/sporestack/bitcash

(bitsv's predecessors)

We shouldn't even try to parse json if the API returns non 2XX. The exceptions being caught are ConnectionException and something else. That will raise an error that won't get caught and we'll stop right there. It's also kind of ugly in my opinion.

carpemer commented 5 years ago

If we retry with different APIs, one possible risk is different APIs are not in the same status, such as the UXTO. Then the tx been built later will be illegal in one API but legal in another. This will make the debugging very difficult and confusing, which I believe we should avoid.

Consider the real business usage in the future, I think the client will prefer to stop syncing the data instead of writing some wrong data.

Do you mean you wish to add something like: if r.status_code != 200: raise ConnectionError to retry before parse?

I have no strong preference here, per my understanding, both ConnectionError and ParseError (as a consequence) could be covered by the current code.

The only reason I did not add those condition check is to avoid those duplicated code. Of course, If you like I could add those 200 condition check to each function.

ghost commented 5 years ago

If we retry with different APIs, one possible risk is different APIs are not in the same status, such as the UXTO. Then the tx been built later will be illegal in one API but legal in another. This will make the debugging very difficult and confusing, which I believe we should avoid.

Yes, agreed. This is already the case. Being able to easily toggle APIs is probably the best option here. I'm saying that this is fairly tried and tested in bit and bitcash. It's not perfect in those, but this really should only matter for 0-conf transactions. Which are important, for sure, but if you need a lot of consistency you should really be running a local daemon or forcing use of one API.

If we could configure API ordering and retry behaviors, I think that would be ideal. Of course some day being able to speak the Bitcoin RPC protocol would be ideal.

About the json parsing, here's a change for bit recently that's probably a little better than what I was doing before: https://github.com/ofek/bit/pull/68

I have no strong preference here, per my understanding, both ConnectionError and ParseError (as a consequence) could be covered by the current code.

It's not, unless I'm missing something.

https://github.com/AustEcon/bitsv/blob/master/bitsv/network/services.py#L317

I mean, with the status code checking we should be okay. But if you are seeing json parsing errors, it almost certainly means there's part of the code where we are not doing that. Or where one of our APIs is being evil and throwing an error without a 5XX. I just don't like that json error, the user should never see that as it's confusing and doesn't make it obvious what the real problem is.

@AustEcon, do you have any thoughts on this?

AustEcon commented 5 years ago

I can't look at this properly right now. Almost at work for a busy day.

There are a few things to sort out. I do want to work towards having the NetworkAPI class working well (what PrivateKey.send() etc uses) and handling retrys and redundancy/ alternative apis intelligently and in a way that ideally is compatible with bit and bitcash syntax as much as possible as @Teran-mckinney says...

So my initial thoughts (without any feedback) were to make NetworkAPI do retrys then if fail switch EVERYTHING over to a different api to stop sync issues that I suspect arise from mixing and matching different apis in NetworkAPI)

PS: See https://github.com/AustEcon/bitsv/pull/17#issuecomment-488541697 for why the above 502 error occurred...

As for the handling of status codes... I will defer to you guys.

I don't have much experience with network connectivity stuff. @teran-mckinney do you think adding this retry wrapper to all of the BitIndex and WhatsonchainAPI functions individually is a good idea? Or should we do the same thing but only to the NetworkAPI section (where all of the logic re: retrying and toggling over to alternative APIs can be handled in one place?)

I am happy to go with whatever you think is best.

carpemer commented 5 years ago

Good idea to use r.raise_for_status(), it is clear. I am also fine to move the logic into NetworkAPI .

I just have some concern about switching the backend.

Consider we are tracking the unspent in the memory, and writing the data to the block, which means the library is not in stateless, If we switch to another backend implicitly, It might be risky to maintain the consistent status?

My understanding is the NetworkAPI solution (GET_BALANCE_MAIN=[]) might be a good idea to switch backend, if the library is a read-only lib, just show some data to the client. But now the API has included the broadcast operation, I am not sure is that a right choice for us to do this now.

I don't have much background about the existing design. Appreciate if you guys could share more insight about the design here.

ghost commented 5 years ago

I'm sorry, I still need to look this over.

Looks like we also have this: https://github.com/AustEcon/bitsv/pull/20

Heading to bed for now.

AustEcon commented 5 years ago

Yes. We have been discussing with the hackathon on that we want to wire this up. @joshua-s has to sleep now. But I am very happy with what he's done from my POV. No paths will be changed but now well organised into modules.

As more APIs get added will keep things tidy this way. I'm testing now.

ghost commented 5 years ago

Or should we do the same thing but only to the NetworkAPI section (where all of the logic re: retrying and toggling over to alternative APIs can be handled in one place?)

I definitely prefer this route, having retry and service switching more central.

@carpemer, your points about keeping API use consistent do make sense. But I think most APIs will broadcast a "valid" TX and not complain about unspents. And this would only happen if the default was down.

I still think this can never be "production quality" without a native Bitcoin daemon. Which neither bit, bitcash, nor bitsv supports... yet. But it can work fairly well, all things considered.

AustEcon commented 5 years ago

@teran-mckinney, @carpenter and I had a chat in slack earlier and basically came to same conclusion.

After the recent refactoring it reinforces going this approach (with all of the redundancy handling stuff in network.py). So we can possibly expect a PR from carpenter in a couple of weeks when he gets some time to work on it.

AustEcon commented 5 years ago

I'll close this PR for now seeing as though a lot has changed since then and we know what the reason was for the 502 error.