bigchaindb / bigchaindb-driver

Official Python driver for BigchainDB
https://www.bigchaindb.com
Apache License 2.0
105 stars 104 forks source link

BEP 14: Roundrobin connection strategy #430

Closed tucanae47 closed 6 years ago

tucanae47 commented 6 years ago

Description

Added support for BEP/14 to improve Drivers Reliability Added new test set for mutiple nodes

The approach implemented is the roundrobin connection strategy previously discussed on bep14 issue

How to QA

For running tests with docker i used suggested approach on contributing guideline command with an addition to test multiple nodes:

$ docker-compose up -d bigchaindb
#add line to run multiple bigchain nodes on the same docker network
$ docker-compose up -d --scale bigchaindb=5
$ docker-compose run --rm bigchaindb-driver pytest -v tests/test_driver.py
codegeschrei commented 6 years ago

hey, one general thing I noticed: some doc strings have changed. Can you check your changes and update + add the doc strings accordingly?

tucanae47 commented 6 years ago

Thank you guys for the review, I'm pushing a new commit with the suggested changes. I agree with mostly everything, so I updated the implementation accordingly. Also, I updated the missing docstrings.

codegeschrei commented 6 years ago

Hey, did you install pre-commit? in transport.py some checks failed for me.

tucanae47 commented 6 years ago

no i didn't sorry, just installed and fixed accordingly.

kansi commented 6 years ago

This PR looks good but I have one concern though,

self.get_connection() decides if nodes are available only if there are retires left for that node i.e.

        for node in self.retries:
            if self.retries[node] > 0:
                return True
        return False

and the next node is picked if the existing node has been put into delay i.e.

        if current_time_ms > picked_time:
            node = connections[self.picked]["node"]
            return node
        else:
            self.next_node(connections)
            node = connections[self.picked]["node"]
            return node

So, for a network of size, say 10 nodes, the max_tries for each node would be 40. Since the available nodes are only decided based on their available retries a net total of 400 retries will be attempted for a given failing request before the driver gives up and raises an exception.

Two questions:

/cc @vrde

tucanae47 commented 6 years ago

Travis tests are failing as stated by @ttmc on this comment.,i believe will pass with that PR merged

codegeschrei commented 6 years ago

@tucanae47 I fixed the tests. You can just merge master into your branch and it should work :)

ldmberman commented 6 years ago

so the last exception raised will have more priority over the timeout, and the goal is to inform (give a hint) the user what is wrong with the last node used

@tucanae47 that's a good point! What do you think about collecting all the errors that occurred? It should be even more helpful.

Also, when the global timeout expires, I would append a custom timeout error to the list instead of mimicking an actual HTTP error cause in this case it's not an HTTP error obviously.

tucanae47 commented 6 years ago

Hey guys, i pushed new changes with last review suggestions. Main change besides review is related with @ldmberman suggestions regarding nodes normalization, so i (propose) implicitly removed headers parameter from Transport class, and let normalization with headers to happen in utils, so we only deal with one kind of format to use internally which is:

[{'endpoint': 'https://test.bigchaindb.com',
     'headers': {'app_id': 'your_app_id',
                 'app_key': 'your_app_key'}},
    {'endpoint': 'https://test2.bigchaindb.com',
     'headers': {'app_id': 'your_app_id',
                 'app_key': 'your_app_key',
         'extra_header': 'extra value'}}...]
ldmberman commented 6 years ago

@tucanae47 thank you for the update! The only part I would like to further refine I described here. What are your thoughts on it?

codegeschrei commented 6 years ago

If you change something again, could you merge master into your branch again? We had some issues with codecov and now it is working again :)

codecov-io commented 6 years ago

Codecov Report

Merging #430 into master will decrease coverage by 0.42%. The diff coverage is 98.41%.

@@            Coverage Diff             @@
##           master     #430      +/-   ##
==========================================
- Coverage     100%   99.57%   -0.43%     
==========================================
  Files           5        5              
  Lines         192      233      +41     
==========================================
+ Hits          192      232      +40     
- Misses          0        1       +1
tucanae47 commented 6 years ago

@ldmberman please have a look at the new push, basically im collecting a dict of errors raised per node:

exceptions = {}
while time_left.total_seconds() > 0:
    try:
        connection = self.get_connection(time_left)
        response = connection.request(..)
        return response.data
    except TransportError as err:
        end = time()
        time_left -= timedelta(seconds=end - start)
        self.pool.fail_node()
        start = time()
        exceptions[err.url] = {"code": err.status_code, "info": err.info}
    except BaseException as err:
      ...
exc_cls = HTTP_EXCEPTIONS.get(504, TransportError)
if len(exceptions):
    raise exc_cls(504, "Gateway Timeout", exceptions)
...

as of the actual error code and exception to return i think is good enough to use aws http load balancers 504 timeout error

ldmberman commented 6 years ago

i think is good enough to use aws http load balancers 504 timeout error

It's appropriate if you implement an HTTP server. Here you might make the user think there is actually an HTTP server somewhere which closed the connection after a timeout.

What do you think about defining a custom exception and raising it there?

class TimeoutException(BigchaindbException):
   pass
tucanae47 commented 6 years ago

Hi, just merged master and it seems codecov is failing, not sure how to fix it.

codegeschrei commented 6 years ago

Hey, the codecov can be ignored for now. It is just complaining about one line :)