HouzuoGuo / tiedot

A rudimentary implementation of a basic document (NoSQL) database in Go
BSD 2-Clause "Simplified" License
2.72k stars 257 forks source link

Investigate IO queuing and its benefits #23

Closed HouzuoGuo closed 10 years ago

HouzuoGuo commented 10 years ago

Investigate how introducing IO queues can benefit tiedot's operation.

The IO queues will operate on:

alexandrestein commented 10 years ago

I made a quick documentation on it: https://drive.google.com/folderview?id=0B9CkwEErGkG1WVVmZkJHMXNackU&usp=sharing ("docs" folder)

As I said to you, I'm not an expert in database but I'm in touch with people who deal with evy bandwidth and concurrent access. It sounds to me like databases are facing the same kind of problems than big storage. ;-)

Tell me how it sounds to you. I hope it will help.

Regards, Alexandre Stein

HouzuoGuo commented 10 years ago

Hello Alex.

Those materials are indeed very helpful, thank you very much!

I am also studying Mongo's queue implementation and see if it helps.

NoahShen commented 10 years ago

I wrote a key-value db based on B+Tree in Java, maybe it's helpful But don't know if you can read Chinese. I wrote the code comments and doc in Chinese. I'll take time to rewrite the doc and translate it into English.

project location: https://code.google.com/p/jue/source/browse#svn%2Ftrunk%2Fjue%2Fsrc%2Fcom%2Fgooglecode%2Fjue doc: https://drive.google.com/folderview?id=0B2aM_tq3elwXMjJlMjkyMzktNjFmYS00YjBjLWIyYzEtMzUxYWNlZTM2NjNl&usp=sharing

Thanks.

NoahShen commented 10 years ago

Hi Houzuo, I use three data sync strategies in my project now. It's like Redis.

Maybe you can use this as reference.

NoahShen commented 10 years ago

Sorry, clicked wrong button, made a mistake.

HouzuoGuo commented 10 years ago

Thanks mate, what about:

HouzuoGuo commented 10 years ago

I am not sure if the second point is useful or not, but the first point has been implemented: bfbc18d84b8e78e1b32bdf5bad5b3ba3e448d404

In the meanwhile, I have not forgotten IO queue! Combining with another known issue https://github.com/HouzuoGuo/tiedot/issues/39 I now have a new idea...

Our experience tells us that tiedot scales well enough from 1 to 4 threads, and it takes advantage of hyperthreading as well. However, going beyond 8 threads does not improve performance at all.

And we know that:

So how about:

That will give us several advantages:

But there is a disadvantage - the "tiedot new generation" will no longer be compatible with the current tiedot.

What do you think?

alexandrestein commented 10 years ago

I think this is a VERY VERY good idea. :+1:

In my cases the retrocompatibility is not a problem. Offline procedure and other tricks are OK to me.

I don't have much time to play with Tiedot this time but I still can do some benchmarks for you.

Kind regards

NoahShen commented 10 years ago

Good idea! :+1: Collection operations are more like Map-Reduce, right? Maybe you can also write a tool to convert old tiedot file to new generation file.

HouzuoGuo commented 10 years ago

OK, let's go ahead with that.

HouzuoGuo commented 10 years ago

Good news - the new idea works pretty well! I setup proof of concept benchmarks in nextgen branch, and the scalability of simple document CRUD operations went beyond my expectation. I am still working on improving hash scan performance, UID operations have really poor performance at the moment.

Stay tuned.

alexandrestein commented 10 years ago

Maybe use B+Tree instead of hash could help improve UID operations. This is an implementation of B+Tree for Go you could try: https://github.com/cznic/b

HouzuoGuo commented 10 years ago

Will do...

HouzuoGuo commented 10 years ago

Just an update on this issue:

By partitioning collection index/data and executing IO operations in parallel, tiedot still cannot achieve linear scalability (no speed up past 12 cores).

But don't worry - the "nextgen" branch brings a totally different approach (sorry - it happened again). It borrows the "single thread" idea from Redis, plus the coordination mechanism used by Cassandra, and tiedot runs each collection partition in an independent process, uses Unix domain socket as IPC mechanism.

Let's see...

HouzuoGuo commented 10 years ago

I have experimented with different implementations of using message passing mechanism to implement collection partitions, however none of them brought any performance improvement at all. Perhaps I could use more HPC programming exercises in Go. For now, I do not think that IO queueing will help improving tiedot scalability. Let's leave this one for later, ok?