Open slanska opened 9 years ago
This error means ActorDB has already been initialized. You can run ./bin/actordb stop, delete everything in data/ directory. Then start it up again ./bin/actordb start, and initialize again using actordb_console
What's stale about the documentation? Yes it is used in production. The project is active (last release was 9 days ago).
We haven't made any real changes in the documentation for a while, because nothing written there has really changed in a while. Right now we are in bug fixing phase. With some feature updates.
Thanks a lot for the prompt response! I followed your instructions and was able to run and play a bit using actordb console. Then I decided to connect to the node via mysql client, I am using JetBrains' 0xDBE tool. I use the following settings: server=127.0.0.1, port=33307, user=myuser, password=mypass, db=music. On attempt to connect/run query I am getting: "The last packet successfully received from the server was 1 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago. java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost." Am I missing something here? Thanks in advance.
I don't know this tool, I doubt it would work. ActorDB requires actor ... statements before running an SQL.
'Stale documentation' was a bit inaccurate term. What I meant is that current docs do not give comprehensive and clear understanding of how system is functioning. Need more examples, need more technical information about how data is stored, now map/reduce is working and how it is different from known alternatives, how system works in multi-cluster environment and how it distributes databases across many clusters, how replication is configured. I personally would be interested to see example of accessing actorDB via MySQL client. I think the idea is brilliant, but probably needs to be expressed in a more explicit way. I see that main target for this system is multi-tenant systems, with (semi)isolated relatively small databases, and ability to run map/reduce queries/updates (though, of course, it is not the main strength of the system). I need to learn more and play a bit to see how it goes in practice. Can you share info on what sort of production environment you currently deploy actor db? Number of nodes, database size, write/read load?
How data is stored in the storage engine is described here: http://blog.biokoda.com/post/133121776825/actordb-how-and-why-we-run-sqlite-on-top-of-lmdb
We do not have map/reduce at the moment.
how system works in multi-cluster environment
There is no real difference from a users point of view. Multi-cluster environment just means some actors are in one cluster, some actors are in another.
how it distributes databases across many clusters
An actor lives inside a cluster. A hashing function determines which actor lives in which cluster. The hashing namespace is divided across all nodes/clusters.
how replication is configured
There is only one replication configuration, how much history to remember. The default configuration files are full of comments. When nodes are added ActorDB will automatically start rebalancing.
Can you share info on what sort of production environment you currently deploy actor db? Number of nodes, database size, write/read load?
I will admit our current production setups are not under heavy load. They are 1/3/5 node single cluster setups. All tests are done using multi cluster configuration however. This is also an area which will see more extensive testing in the near future as well.
Thank you for the explanation. Question about connectivity - how I can connect to ActorDB via MySQL client? I tried another GUI utility (Valentina Studio, namely). I am getting "{error, no_actor_defined}" response.
Regarding overall system architecture - let me summarize what I learned from doc and your comments. Node is a physical or virtual machine, running actor db server, with many individual actor databases (in one huge LMDB file, I guess?). Cluster is a set of nodes (1/3/5...) for replication (using Raft consensus). So, all databases on one node get replicated on all other nodes in the cluster. Multi-cluster system - system which combines many clusters and is controlled by global actor (which is responsible for allocation of nodes for the new actors and re-allocation and transferring actors when new node/cluster gets added). Correct?
@slanska: Regarding GUI utilities there can be a problem using the MySQL connector since most of them perfrom automated queries to resolve the schema to display it. These queries are in most cases uncompatible with the ActorDB.
The reason why you are getting such error in Valentina Studio is because it queries ActorDB with the following query to resolve MySQL schema before it allows writing SQL statements:
SELECT SCHEMA_NAME AS `Database`, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE \tSCHEMA_NAME <> 'mysql' AND \tSCHEMA_NAME <> 'information_schema' AND \tSCHEMA_NAME <> 'performance_schema'
Same goes with JetBrains 0xDBE tool it tries to run this statement:
SELECT 1
Both of the statements are invalid from the ActorDB side since they don't contain the ACTOR definition within the statement.
We will take a look if we can support these utilities.
MySQL client is supported as a connector tool and it should work to querying the database through MySQL command line tool and existing MySQL drivers.
For playing with schema and querying the database you can also use actordb_console tool which is shipped with every ActorDB build and provides all the basic functionality to query the database.
mysql --host 127.0.0.1 --port 33307 -u myuser -p actor type1(u) create; insert into tab values (1,'aa',2); Query OK, 1 row affected (0.01 sec
actor type1(u) create; select * from tab; +------+------+------+ | id | txt | i | +------+------+------+ | 1 | aa | 2 | +------+------+------+ 1 row in set (0.00 sec)
Regarding architecture you are mostly correct. Global actor is just holds state and makes sure that state is spread through out all nodes. It does not make any decisions. Sequence of events when adding node:
Sergej, djustinek, thanks for explanation! I will try to access actor db node via MySQL driver for node.js (this is what I am using at the moment for development). I read all available documentation and blog posts about ActorDB, that I found. Few things still remain unclear for me. I will appreciate if you could help me answering those: 1) How replication factor(cluster size) is determined? If, for example, system has 32 nodes, how I can configure replication factor of 3? I do not need specific steps for now, just looking for conceptual info. I have some experience with Riak and it has special setting in the configuration file, which determines replication factor. Also, is it possible to have different cluster size/replication factor for different databases in the same system? 2) For multi-node, multi-cluster system, is there just single connection address? Will system redirect requests to the right node to process? I.e. will it perform load balancing automatically? In that case, is there a chance that this node will become bottleneck due to high volume of incoming requests? 3) Need examples of accessing actor db via thrift. I do not have any experience with thrift, maybe it is obvious and identical to how thrift is used with, e.g., Cassandra. But exact sample code would be ideal. Example for node.js thrift client would be even more ideal. 4) Not sure how key value storage is working. Is it a just special kind of type (ACTOR type), which is spread across ALL nodes? How is it different from custom type?
Continue... 5) I realize you do not have any API/UI for monitoring/health checking/configuration of cluster? Any plans on that? 6) Do you perform any stress tests? What would be considerations for multi-tenant system with 10K-100+K users (i.e. separate actors), varying in size from few megabytes to few gigabytes, installed on cluster with dozens of nodes, with replication factor 2 or 3?
Overall, I like idea of actor db a lot and would like to play a bit. I think that documentation, examples and tests are areas for improvements.
If mytype is KV, then this actor statement: actor mytype(mykey); Will mean mykey is not an actor, but it is just a key. This key will be hashed to determine which shard it belongs to, then the SQL statements will be sent to that shard(actor). You can use multiple tables, but they all must use a foreign key to actors table. Compared to most other KV stores it is a very powerful model, because your value is not just a blob, it can be multiple columns and multiple tables.
since all individual db files are stored in a single lmdb, how it is efficient in terms of highly concurrent access? Correct me if I am wrong, but I was under impression that lmdb locks entire file for writing?
The dirty secret of most storage engines is that they are single writer. Actual reads are generally concurrent and lmdb has the best read concurrency of any engine, but you can't write with multiple threads on actual hard drives/filesystems and have good performance. Unless you are only appending to a file and nothing else, then you can do it with multiple threads. This holds true for lmdb, leveldb and I'm pretty sure postgresql as well.
It's pointless to worry about it really. Everything has the same problem.
Nice explanation! Thanks. Though, I am still not sure whether it is applicable to SSD drives? For HDD it is clear - access is sequential, as there is one head only. Even with I/O optimization by SCSI drives. But SSD is different - it allows random access with zero overhead for "repositioning".
On Monday, November 23, 2015 9:52 AM, Sergej Jurečko <notifications@github.com> wrote:
The dirty secret of most storage engines is that they are single writer. Actual reads are generally concurrent and lmdb has the best read concurrency of any engine, but you can't write with multiple threads on actual hard drives/filesystems and have good performance. Unless you are only appending to a file and nothing else, then you can do it with multiple threads. This holds true for lmdb, leveldb and I'm pretty sure postgresql as well.It's pointless to worry about it really. Everything has the same problem.— Reply to this email directly or view it on GitHub.
Maybe. There is still the matter of the file system. You can run multiple storage folders on the same drive and see if it truly doubles performance. ActorDB will spread actors across additional folders.
Do you have any benchmark numbers for the scenario with multiple database access? I guess that placing all actors into single lmdb file might introduce unnecessary performance restrictions in compare to scenario where every database lives in int own file. Again, it is just my guess, needs to be confirmed empirically.
On Monday, November 23, 2015 11:03 AM, Sergej Jurečko <notifications@github.com> wrote:
Maybe. There is still the matter of the file system. You can run multiple storage folders on the same drive and see if it truly doubles performance. ActorDB will spread actors across additional folders. — Reply to this email directly or view it on GitHub.
Do you have any benchmark numbers for the scenario with multiple database access?
Using multiple lmdb's on a single ssd? No I do not.
I guess that placing all actors into single lmdb file might introduce unnecessary performance restrictions in compare to scenario where every database lives in int own file. Again, it is just my guess, needs to be confirmed empirically.
Every actor being in its own file? So using sqlite3 files directly on disk? That would perform awfully. Hard drives are not made for that kind of concurrency. A big advantage of lmdb is also that it does not copy data around and we can compress the pages before writing. Once written it stays where it is (sqlite checkpoints wal file so everything is written twice).
SSDs are much better at random access, I'm not so sure they are that much better at concurrency.
I am trying to follow setup steps (http://www.actordb.com/docs-getstarted.html#download_run) and on attempt to execute ./bin/actordb_console -f etc/init.example.sql, I am getting: Connect/login error: {error, {login_failed, <<"Username and/or password incorrect.">>}}
This project looks very interesting, indeed. What is its current status? I see that github sources have been updated, but documentation is stale and was not updated for long time. Is actordb used anywhere in production?