creationix / node-leveldb

NodeJS bindings to levelDB - a fast and lightweight key/value database library
http://code.google.com/p/leveldb/
113 stars 32 forks source link

Refactor to use proper header and source files #8

Closed carter-thaxton closed 13 years ago

carter-thaxton commented 13 years ago

Hey there,

I've gotten started on a branch to make node-leveldb use EIO, which will actually go async. However, until I finish that, I figured I'd submit a refactor that I did, so we can all stay synced up.

This basically pulls the classes apart into separate header and source files, which are more loosely coupled, along with best practices of C++. It always makes me cringe to see .cc files being #included. :) There's no change in behavior whatsoever with this, but I hope you'll agree the C++ is better organized.

I also refactored the build script a bit. Hopefully you won't mind that I removed "dumpdata". After all, that's what node-leveldb is for now!

Thanks,

creationix commented 13 years ago

Fine by me, @tilgovi should double check the C++ since that't not my area of expertise. Also, would it be possible to squash this into a single commit?

creationix commented 13 years ago

I do hope we get the file structure settled soon. I thought someone had already moved stuff into header files. All this restructuring of code really confuses the version control system and makes it hard to tell where code came from.

carter-thaxton commented 13 years ago

Cool, I've squashed the changes.

tilgovi commented 13 years ago

Beautiful. It's not super easy to read such a big diff, but correct me if I'm wrong: 1) You dropped the namespace blocks from the .cc files, replaced with 'using'. Goes one step beyond what I did (which was to move the prototypes into the .h files to begin with) 2) You dropped index.cc because, really, it serves no purpose at all.* 3) You dropped dump data. I don't even know what this is, but I'm okay with it being gone.

*I once tried to remove the definitions at the top of index.cc and it broke compilation. I'll verify things work before I merge this later.

On the topic of async versions, I did some work last night and I'll try to open a branch to show it tonight. I think I found a way we can just use a couple helpers and it'll get us around the onus of making separate async versions and callback handlers for every function. Basically leveraging the fact that every JS function in C++ looks like Handle<Value> (*js_function)(Arguments&) I'm going to see if I can just stash away the Arguments and the passed in callback function, strip the callback off of Arguments (or nullify it), and "recur" on the sync version in an eio thread. So, just one function in helpers called MaybeDefer or some such thing.

carter-thaxton commented 13 years ago

Yep. Right on all 3 counts...

1) "using namespace" simplifies references to other declarations, but doesn't change your own declarations. Therefore, you'll see that the .h files still wrap the class declarations in namespace blocks. I usually like to avoid indenting the top-level namespace block, and apparently so does Ry. Check out node_object_wrap.h, for an example of what I mean.

2) Yes, the idea of .cc files including other .cc files is pretty hacky. The whole idea is to enable separate compilation, so the compiler can chew on each source file independently, and spit out a corresponding .o file. The .h files specify the interface for that .o file (and in C++, the interface includes the instance variables and private methods, because it affects how objects are laid out in memory, which users of that object need to know. e.g. sizeof(DB) needs to be well-defined, as do the vtbls for virtual methods, which is why private methods need to be in there, too...)

3) I think dumpdata was a program that Tim or someone wrote, just to test out using leveldb on its own, without node. Great little tool for testing - but doesn't belong in this library.

*When you declare a static instance variable in class declaration, it still needs to be defined once somewhere. I moved those definitions to the corresponding .cc file, just like the definitions of methods.

Remember, declarations can be included and seen multiple times by the compiler. They basically help the compiler know about other types, symbols, and the offsets and sizes of those symbols. Definitions, on the other hand, actually put something concrete into the .o file, and the linker finally puts it all together. If you have a declaration in a class (instance variable, or a method), say in a header file, but you forget to define it in one of your .cc files, the linker will complain. The compiler probably won't, because it works on each file independently. Conversely, if you #include .cc files more than once, you end up defining multiple copies of a variable, and the linker will again complain. You always need exactly one definition for everything that's declared, and those typically go in the .cc file.

Inlining is a special case. You can inline a method definition inside of the declaration, say for trivial constructors or getters, but that causes no definition to be produced in a .o file. Instead, the compiler always produces a copy of that code in each place it's used. That's why it's in the .h file - it's part of the interface. All callers of an inlined method need to know the actual code that gets put in each call. This is usually frowned upon for anything but the smallest of methods. Though it looks simpler, it's actually more complex to understand, so I usually stick to the rule of never inlining, unless it becomes important for perf after doing some profiling. Make sense?

BTW, on the Mac, some of the linking is actually deferred until runtime by default, so you might not even see the issue until startup. If you see 'dyld' errors, that's what's up...

Hope this helps.

tilgovi commented 13 years ago

Cool. I'm just gonna merge this then from here and close the issue. I haven't poured over line by line, but I grasp everything no problem.