druid-io / pydruid

A Python connector for Druid
Other
507 stars 198 forks source link

Add support for asynchronous client #39

Closed turu closed 8 years ago

turu commented 8 years ago

In a nutshell: This pull request adds, in a fully backward compatible way, a new optional, asynchronous client to the library.

Why this matters? Druid can be very useful in many scenarios different than ad-hoc analysis of online data - f.e it can be successfully used to create flexible and interactive analytics dashboards or monitors. Applying asynchronous programming style is often a go-to architectural choice, when designing such systems - especially, when they are implemented in single threaded languages like python. The lack of asynchronous client can therefore be a serious problem. This pull request fixes it.

What has been done?

(accidentally ;)) Fixed issues:

  1. https://github.com/druid-io/pydruid/issues/29
  2. https://github.com/druid-io/pydruid/commit/fa3c3660dadf106881bb92f4456acae5c0fab0d0 fixes the non-deterministic behavior of test_export_tsv. The issue originated from the fact that python dictionaries are unordered.

A note on implementation of AsyncPyDruid: Asynchronous client uses elements of Tornado framework. This was a conscious choice based on the fact that Tornado is stable and mature, and integrates very well with python’s 3.5 native async support (async, await keywords) as well as other asynchronous frameworks. As a result one is not locked into using Tornado for the whole of his/her app. See: http://www.tornadoweb.org/en/stable/guide/coroutines.html#python-3-5-async-and-await

A note on backward compatibility: As a result of refactoring, clients are reusable. This is achieved through clients’ query methods returning Query objects filled with results, rather than result dictionary. However, this change is fully backward compatible - clients still expose the same (now deprecated) interface allowing one to access details of the most recently executed query. Query objects on the other hand, act as wrappers around result dictionary, giving users the same dictionary interface they had before, enriched to include methods for exporting results and accessing query details.

A note on python 2.x <-> 3.x compatibility: The changes have been tested and should work reliably both in python 2.x and 3.x

Additional note on the authors: Changes introduced by this pull request have been originally developed in https://github.com/turu/pydruid by me (github.com/turu), and github.com/l-j. Since then, they were forked to our organization (github.com/Dreamlab) and they are already used in production. We both signed the CLA.

Best regards, Piotr Turek

fjy commented 8 years ago

@turu Thanks for this contrib. Do you mind filling out the CLA here: druid.io/community/cla.html

turu commented 8 years ago

@fjy I already did. Can you verify that you have my CLA registered?

drcrallen commented 8 years ago

Note there's a commit from https://github.com/l-j in here

drcrallen commented 8 years ago

and there's a cla on file for @l-j and @turu

turu commented 8 years ago

@fjy @drcrallen Is there any update on the status of this pr?

fjy commented 8 years ago

@turu apologies for delay, will ask people to finish reviewing this

@drcrallen @xvrl please review

fjy commented 8 years ago

@turu can we resolve the merge conflicts and I will merge this

turu commented 8 years ago

@fjy I resolved merge conflicts. To the best of my knowledge, everything should be ok. All tests are passing.

xvrl commented 8 years ago

@turu Thanks for the comprehensive description, a few questions:

xvrl commented 8 years ago

Otherwise I'm :+1: on this one, assuming someone has tested that the existing client still works well in practice.

turu commented 8 years ago

@xvrl Regarding the points you mentioned: