bigeasy / locket

A pure-JavaScript implementation of LevelDB for LevelUP.
MIT License
81 stars 3 forks source link

Locket general discussion. #1

Open bigeasy opened 11 years ago

bigeasy commented 11 years ago

Feel free to discuss your hopes and dreams for Locket in this issue. If you've found a bug, please just open an issue.

Early releases are numbered 0.0.x. Every micro-bump is a possible breaking change. Please use these pre-releases to help development, but use them carefully.

Raynos commented 11 years ago

I am 100% excited with a full JS leveldb implementation.

@rvagg is also working on level in JS stuff ( https://github.com/rvagg/leveljs-sst )

Would love to use this locally instead of leveldown for tests and stuff.

bigeasy commented 11 years ago

@raynos Thank you. I'm looking forward to exposing the underlying b-tree to LevelDB'ers and getting the benefit of all those unit tests and benchmarks. The underlying implementation is Strata. It does have 100% test coverage, but it's hard to know what the issues are going to be with a concurrent data structure.

I see you contribute to the LevelDB core. I'm probably boing to be hanging out on #leveldb with week, asking about the finer points of the API implementation. I had a long exchange with @dominictarr in a Strata Issue about LevelDB, there and IRC, which gave me a lot of direction on how to implement an AbstractLevelDOWN.

I'm not sure if the LevelDB community meets anywhere outside of IRC on the web, but it would be nice to meet you all at some point. Is there ever a LevelDB podcast?

Raynos commented 11 years ago

There is a leveldb podcast. You should ping @mikeal about it. There are leveldb meetups around the world, in aus & ireland.

I really should host the SF leveldb meetup.

bigeasy commented 11 years ago

I probably should ping @mikeal. I won't be making it to any meetups any time soon, but an online meetup, it would do a lot to help to help understand the goals of the community.

Raynos commented 11 years ago

@bigeasy talking to people in #leveldb is a start.

dominictarr commented 11 years ago

yes, ##leveldb!

bigeasy commented 11 years ago

I'm @prettyrobots on freenode when I'm on freenode.

mikeal commented 11 years ago

I'd like to get the podcast going more regularly and be less of a thing we try hard to produce for an audience and more like a discussion about something we need to talk about anyway that we just publish after.

If there's a particular topic you'd really like to talk about and achieve some clarity on we should put one together with all the right people.

bigeasy commented 11 years ago

@mikeal That would be a good format. One that might appeal to a wider audience than you might imagine. Kind of like Peep Code's Play by Play. Rather than trying to compare the features and benefits of one module against another, talking about a problem so that people can hear how problems get solved. Raw, fuzzy, show your thinking.

bigeasy commented 11 years ago

Those of you watching this thread may or may not find the diary interesting. I don't expect you to comment, nor do I mean to bother you. I'm going to, as they say, just leave this here...

https://github.com/bigeasy/locket/commit/eba30fc54e28b0dcc6d61d31de6c0b2ca604296d

bigeasy commented 11 years ago

Locket status: You can follow along with the Locket progress by reading my commit lock over at Strata. I've gathered Strata issues into meaningful milestones. An ascending completeness view is a good way to track progress. There are four milestones that a blocking Locket. They will be implemented, probably in this order.

Store Keys in Leaf Pages — Whenever you read a b-tree tutorial, they talk about how wonderful it is that you can always keep the tree in balance by splitting as you go. If the leaf page files, split it. If that fills it's parent, split that. This is a false economy when you're interested in concurrency. Balance is a separate, delayed operation now. But, long ago, I was still laboring under the delusion that split as you go was an important property of a b-tree. I tried hard to make that work, while keeping the tree live. Part of liveness was to force the branch pages to look up their keys as needed, which minimized the I/O necessary to rewrite a branch page. The thinking was so wrong headed, I can't grasp it any longer.

In any case, I only recently realized that storing only addresses in branch pages was a legacy, once that could be easily removed. Prior to implementing a binary file format, I'm going to make this change, since it will help the file format requirements to settle. No point in creating the binary file format, then immediately altering it to handle keys.

Create Binary File Format — I've already one a lot of work to make the JSON file format more like what a binary file format would be. I'm using length encodings instead of searching for new line — 0xa — record terminators. A binary file format would be like the line-by-line JSON file format, it will checksum each log entry in the leaf page log file, you'll be able to read it backwards and forwards, if your data becomes corrupted, you'll still be able to weed through saving those records that can be saved.

Binary is necessary because that's what LevelDB provides, what LevelUP expects.

Use External Caching and Locking Libraries — I've externalized the caching and locking primitives in Strata to Magazine and Sequester respectively. Magazine is needed for Locket. Strata has an LRU page cache, but each Strata tree will have a separate page cache. This is a bother when Locket will use multiple Strata trees to implement Log-Structured Merge; many trees with separate caches means a meaningless LRU.

While I'm adding dependencies, I might as well do Sequester. I like that the testing is externalized.

Also, I want to use the new crypto.Hash interface for checksums. It allows me to plug in different hashes from the hashing collection I'm working on.

Support Locket — By this point I'm doing the things that Locket needs, Issues specific to Locket, stuff that I want anyway, but Locket wants now. Reverse iteration is a part of Strata, but it's not exposed through the interface, so that is the only issue in Support Locket so far.

bigeasy commented 10 years ago

Strata is done. It supports Locket. Locket now passes all the AbstractLevelDOWN tests.

You'll find that the details of the log merging and the version control have been broken out into a great many supporting libraries. These libraries can be used to build an MVCC database of your own design if you would like.

What stands in the way of an initial release is balancing, merging and purging (deleted records). Currently, I'm just writing out into a Strata tree without ever balancing the tree. This is something that needs a simple solution, like check for balance every leaf page size batch operations, merge less frequently than that, purge less frequently than that.

If anyone can suggest a LevelDB corpus to test against, that would be swell.