Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.2k stars 532 forks source link

Support for kyoto tycoon #615

Open kallaballa opened 6 years ago

kallaballa commented 6 years ago

I'd like to ask if there are any plans to support kyoto tycoon as a backend. And if not, I'd like to find out what it takes to implement it, since i might do it.

kallaballa commented 6 years ago

@ipapapa stated on gitter that he (nor anyone else i guess) hasn't looked into it. I will try to find out what it takes to implement kyoto tycoon support. Btw. the main protocol (it also support others) is a TSV-RPC implementation: http://fallabs.com/kyototycoon/spex.html#protocol

ipapapa commented 6 years ago

Some good information can also be found in #310. I specifically describe the Mongo work which can be used as a driver for this work. Adding a parser first requires adding a protocol parser in proto. Currently, there is Memcached and Redis, with Redis being the one we use in production.

kallaballa commented 6 years ago

Ok.. I did read through the whole wiki and i tried to wrap my had around code in proto. But there is something i really don't understand and that is the msg struct. Could you eloborate on it?

smukil commented 6 years ago

@kallaballa The msg struct comes as part of the fork from twemproxy. The best way to analyze the struct is to look at it under GDB to see how it's accessed/modified during a query.

Some of the important fields are: struct mhdr mhdr; This is the 'mbuf' header and is initialized here: https://github.com/Netflix/dynomite/blob/dev/src/dyn_mbuf.h#L47 It contains a list of 'mbuf'(s) which are the buffer(s) that contain the request/response strings.

c_tqe, s_tqe and m_tqe are entries in a linked list and an instance of the msg struct can be a part of any one of those (depending on client/server, etc.)

'parser' is the function called on the query to determine how to parse it according to the underlying storage layer used. This is where the parser is set according to the storage layer: https://github.com/Netflix/dynomite/blob/dev/src/dyn_message.c#L414-L440

'rsp_handler' executes some logic when a response is received. For example, under DC_SAFE_QUORUM, it makes sure that the checksums of the data from different replicas match: https://github.com/Netflix/dynomite/blob/dev/src/dyn_client.c#L1025-L1039

Hope this helps.

Note: Over time, we will be adding more code documentation, and be simplifying the structs to make the code easier to understand.

kallaballa commented 6 years ago

I started on implementing the parsing code: https://github.com/kallaballa/dynomite/commit/8b20830834f65c1a339f8edc2054a8fd70abf536

The code compiles but it is completely untested. Also please note that I'm a C++ dev and don't know my way around C well. Am i going somewhere with this?

ipapapa commented 6 years ago

You also need to add data_store number so that one can choose your proto and I assume there are more changes to be made. In the end, it would be nice to expand the testing framework to add some unit/functional testing.

kallaballa commented 6 years ago

Ok. i've got it to the point where dynomite is able to forward messages between a single node and a single client: https://github.com/Netflix/dynomite/compare/dev...kallaballa:dev But the parser code is error prone and cumbersome. Would it be acceptable if i used https://github.com/h2o/picohttpparser for parsing?

kallaballa commented 6 years ago

I completed the parser code and it works fine. dynomite is forwarding messages correctly. how do i proceed? e.g.: how do i make dynomite aware of keys.

ipapapa commented 6 years ago

I am not following your question. If Dynomite is doing what it is supposed to do, what do you mean by Dynomite being aware of the keys?

kallaballa commented 6 years ago

I thought that following functionality needs dynomite to be aware of what keys are present in which node.

A client can connect to any node on a Dynomite cluster when sending write traffic. If the Dynomite node happens to own the data based on its token, then the data is written to the local datastore server process and asynchronously replicated to other racks in the cluster across all data centers. If the node does not own the data, it acts as a coordinator and sends the write to the node owning the data in the same rack. It also replicates the writes to the corresponding nodes in other racks and DCs.

kallaballa commented 6 years ago

Let me phrase it more precise. At the moment all the kyoto tycoon code does is parse the message and discard keys and values. I can see that the redis code and the memcached code does more than that. e.g. they manipulate msg->keys and msg->token. What do those fields do. Are there other fields i have to setup?

smukil commented 6 years ago

@kallaballa 'keys' basically holds the array of all keys mentioned in the query. Eg: The Redis query 'smembers myset', would have mean that the 'keys' array has one element called 'myset'.

It's then used in other places for additional verification. Eg: We check the 'keys' array here to make sure that all the keys accessed for an EVAL query are on the same node: https://github.com/Netflix/dynomite/blob/22b5022c1cb55c7d2530689691823d92a25078bc/src/proto/dyn_redis.c#L3139-L3159

The 'token' field is used internally for pointing the the current token being processed. For example, the above query from Redis, looks like the following over the wire when we start parsing the request: (gdb) print r->token $1 = (uint8_t ) 0x5555557e6860 "2\r\n$8\r\nsmembers\r\n$5\r\nmyset\r\n"

As the query is being parsed, the token field points to the next token after the first one is parsed: (gdb) print r->token $4 = (uint8_t *) 0x5555557e6864 "$8\r\nsmembers\r\n$5\r\nmyset\r\n"

(gdb) print r->token $10 = (uint8_t *) 0x5555557e6872 "$5\r\nmyset\r\n"

... and so on. (You can refer to this doc to understand the redis wire protocol if you want to draw parallels to Kyoto Tycoon: https://redis.io/topics/protocol)

kallaballa commented 6 years ago

thank you so much. I think i got enough to continue on.

kallaballa commented 6 years ago

is msg->pos always a null terminated string or is there another way to determine its length?

kallaballa commented 6 years ago

I found the answer, the length can be determined by: b->last - b->pos. e.g: https://github.com/kallaballa/dynomite/blob/aab424f44f8b43d51083ad735834d1f4dd513128/src/proto/dyn_tycoon.c#L283

kallaballa commented 6 years ago

I've successfully implemented parsing of set, get and remove command. Additionally keys are parsed and added to the msg. Now i have 2 questions:

  1. What else is there to do to integrate kyoto tycoon?
  2. I would like to implement cursor based KT commands. Which means that first a cursor will be created, returning a descriptor for the cursor, followed by a series of commands which use the descriptor. That clearly requires that cursor based commands have to stick to the node that created the cursor. How would i implement that "stickiness"?
ipapapa commented 6 years ago

For (1): Do you have a functional prototype? It would drive you through what is needed. You can deploy a single node Dynomite with a Kyoto Tycoon backend to figure out what is missing. Eventually, we might want to add some more testing scripts for Kyoto Tycoon so changes made at Dynomite are properly supported by the data stores (we usually try not to break other storage engines but having some test coverage is a good methodology). (2) Redis SCAN command (and its variants) is a cursor-based command. You might want to start by looking at that.

kallaballa commented 6 years ago
  1. Yes: https://github.com/kallaballa/dynomite
  2. Thanks, I'll check it out.
kallaballa commented 6 years ago

I have implemented all commands my application needs (including cursor based commands) and it works in a single node setup if i turn the dynomite verbosity level to 8 or more. Else it breaks. Any idea what might cause this?

kallaballa commented 6 years ago

OK, i found the bug that broke the parser, though i still have no clue why behavior changed with verbosity level. How would i test a mutli node setup? e.g. How do i test if conflict resolution and sharding does work?

kallaballa commented 6 years ago

I created a 3 nodes in one rack setup by using following conf files: https://github.com/kallaballa/dynomite/blob/dev/conf/tycoon_rack1_node1.yml https://github.com/kallaballa/dynomite/blob/dev/conf/tycoon_rack1_node2.yml https://github.com/kallaballa/dynomite/blob/dev/conf/tycoon_rack1_node3.yml

Replication doesn't seem to work. When i set a key on node1 that key doesn't exist on node2 and node3. Any ideas? Could you maybe have a look at the code and find what's missing? https://github.com/Netflix/dynomite/compare/dev...kallaballa:dev

smukil commented 6 years ago

Hi @kallaballa . It may be the case that some debug/higher verbosity statements may have some implicit behavior changing effects which are definitely bugs, and if you find them, including a fix for that would be a great idea as well.

The best way to simulate a cluster is to look at the script in test/cluster_generator.py. That is a test script that sets up a simulated multi-node Dynomite backed by Redis. Maybe you could try something similar with KyotoTycoon and verifhy its working.

kallaballa commented 6 years ago

Well i think we are not there yet. The code I've written is full of guesses since all I did is trying to mimic what other parsers do. I'm pretty sure the code is missing out on a lot that goes on in dynomite. It would be really nice if someone could look at the code and guide me. I will still look into creating a cluster generator.

smukil commented 6 years ago

@kallaballa I can have a look at it, but my review might be slow since I don't have too much context on Kyoto Tycoon yet. I would suggest that if you make improvements, you can continue to add them. Additionally, documenting non-trivial parts of the code would be very helpful.

ipapapa commented 5 years ago

@kallaballa how far are you in filling a PR?