c9s / r3

libr3 is a high-performance path dispatching library. It compiles your route paths into a prefix tree (trie). By using the constructed prefix trie in the start-up time, you may dispatch your routes with efficiency
http://c9s.github.com/r3/bench.html
MIT License
814 stars 83 forks source link

r3_tree_dump can now optionally dump to a string. Fixes #15 #21

Closed thedrow closed 10 years ago

thedrow commented 10 years ago

Sorry for the mess. Just cherry pick what you need.

c9s commented 10 years ago

Hi, can you add a test case for this? also can we simplify the tree_dump function? I saw there are two blocks use the same logics. (if (out) { A } else { B })

We can simplify the code by using functions like this:

void r3_edge_to_str(edge *e, char *buf, int *len);  // returns the output string with length
void r3_node_to_str(node *n, char *buf, int *len);  // returns the output string with length
void r3_tree_to_str(node *n, char *buf, int *len);    // returns the output string with length

And we can reuse the functions inside the r3_tree_dump() method. (call r3_tree_to_str and use printf to print it) and this keeps the API consistent and simple.

so if you need to dump the tree in string, you may run r3_tree_to_str and which can return a string for you.

c9s commented 10 years ago

By the way, you may run git rebase -i HEAD..[upstream HEAD] to rebase the commits.

thedrow commented 10 years ago

I'll do it this evening.

thedrow commented 10 years ago

@c9s Btw, why do I need the length? Or maybe it's just what the C API requires. Python doesn't really need it.

c9s commented 10 years ago

When you pass the string memory pointer to the function, you don't know the returned length.

In C, we usually save the length to avoid length re-calculation. or you may write the functions like this:

int r3_edge_to_str(edge *e, char *buf);  // returns the output string with length
int r3_node_to_str(node *n, char *buf);  // returns the output string with length
int r3_tree_to_str(node *n, char *buf);    // returns the output string with length

which is just like fread function in stdio.h, returns a length of the content.

thedrow commented 10 years ago

That might be a better option. Does it match the rest of the API?

c9s commented 10 years ago

well, for *_to_str functions, yes.

c9s commented 10 years ago

does sprintf allocate memory dynamically? what if the dumped content exceeds the size of memory we allocated ?

thedrow commented 10 years ago

I have no idea. I'm not much of a C dev (it's been a while).

thedrow commented 10 years ago

Well, it seems that it doesn't. http://stackoverflow.com/questions/3774417/sprintf-with-automatic-memory-allocation How do we solve this?

c9s commented 10 years ago

We may use asprintf instead, please see the below usage:

    char * out = NULL;
    int len = asprintf(&out, "%s", "what if");
    printf("%d: %s\n", len, out);
c9s commented 10 years ago

The use case of our API might look like this:

char *buf = NULL;
int len = r3_edge_to_str(e, buf);
free(buf); // free it when it is no longer needed.
thedrow commented 10 years ago

But it uses malloc and you're using another allocator, Is that a problem?

c9s commented 10 years ago

No

[Sent from iPhone]

Omer Katz notifications@github.com ©ó 2014/5/19 18:33 ¼g¹D¡G

But it uses malloc and you're using another allocator, Is that a problem?

¡X Reply to this email directly or view it on GitHub.

c9s commented 10 years ago

I was considering about using jemalloc, but since it won't require a lot of memory allocation during the match operation, it's not requirement anymore. :-p

thedrow commented 10 years ago

So it should be removed from the dependencies no?

c9s commented 10 years ago

well, I think we can remove the flags from Makefile.am for now, but comment them for future.

c9s commented 10 years ago

the include of jemalloc is commented in 41dc7373f3bed655c7548da5229bad6ab1b310d9

thedrow commented 10 years ago

How about using https://github.com/antirez/sds instead of writing our own C API for strings?

thedrow commented 10 years ago

@c9s Also, jemalloc is still in the README and I assume the build process as well.

c9s commented 10 years ago

We could use that for dumping the tree structure, but we should keep the original string functions for tree operations.

c9s commented 10 years ago

dumping tree structure should be small task, I would prefer this: https://github.com/dpaneda/hiphop-php/blob/master/src/util/smart_str.h

which is smaller and the interface is pretty clear.

thedrow commented 10 years ago

We're handling strings all around. sds is being used by Redis which means that the API is pretty much sound. Also, since redis uses jemalloc it will be much easier to integrate it later.

c9s commented 10 years ago

I am fine with sds.

but the tree lookup operation should avoid using another struct to maintain the strings. it requires another pointer to fetch data. which will slow down the lookup.

thedrow commented 10 years ago

I believe you can access the underlying string. Is there somewhere else that we can talk? Maybe gitter.im? Can you open one for this repository?

c9s commented 10 years ago

I think we can use sds for the *_dump related functions, but not for the tree lookup / pattern compile :)

I am outside right now, might not be able to chat XD

thedrow commented 10 years ago

I say let's benchmark. It is being used by Redis which is really fast.

c9s commented 10 years ago

yeah I believe that is fast, but that's fast in string concatenation, not in our own optimized lookup method:

Please see: https://github.com/c9s/r3/blob/master/src/node.c#L326

If you use sds struct to wrap *(n->edges[i]->pattern) that will require another pointer fetch.

Also the preallocation size in sds is 1024_1024 bytes. assume you have N node, that will consume 1024_1024*N bytes for the whole tree.

c9s commented 10 years ago

weird I didn't close this issue

thedrow commented 10 years ago

I did. I'm rewriting it.