Questions after reading docs

dessalines commented 9 years ago

I read over the actorDB docs and have a few questions:

How do I choose what data gets replicated? This page tells me how to insert rows on an actor's table, but not how to choose how much of that information get synced or replicated. Lets say I have user nodes, that should only store their own info, but can still query the petabytes of data that different servers are holding. Is this sharing portions of data, or completely replicating all the data?
If the servers are only holding portions of data, how do distributed joins work? Lets say one shard has one table, and another shard has another table I need to join to. Would I just run ' ACTOR type1(*) ' , make sure they're the same type, and run my sql statement?
What are type1, type2, etc. Are they just arbitary names for what groups of tables can talk to each other?
I want to create a distributed system where users can join and leave the system at will. Is there any way to do this? (Maybe keep a shared table of IP addresses, change the configuration file, and restart actordb?)
And most importantly, I'd like to be able to test this locally(I don't have a network of computers to play around with). Is there any way to have multiple instances running, or a test mode, or something where I can test the multi-server functionality on one machine? Most DHTs have this.

SergejJurecko commented 9 years ago

Every actor is an independent isolated database. If you are running ActorDB over multiple servers, they get replicated automatically. If you have multiple clusters of servers (generally it is recommended to have clusters of 3), every actor will live inside some cluster and be replicated within that cluster. Example in section 2.4.1, creates 3 isolated databases. They all have their own table named "tab". When it comes to sharding of actors, ActorDB will try to evenly spread actors across servers. It will place some actors in one cluster and some actors in another cluster. ActorDB does not have functionality at the moment to control how an individual actor is replicated. It will automatically be placed somewhere.
ActorDB does not have true joins across actors. It has multi actor queries. They are described in section 4 of the documentation. This is an area that can see a lot of improvement in future versions. ActorDB works great for scenarios where having a lot of small isolated databases is key. This is what ActorDB sacrifices to achieve scalability.
type1, type2 are two completely different types of databases that can be a part of ActorDB. What this means is type1 can have one schema, type2 can have an entirely different schema. They both can have as many tables as you want. If you were creating a blog for instance. Every blog post + comments for that post would be one type of actor, then every registered user would be a second type of actor.
Actors can be deleted and created at any time. I would need more information to answer more.
You can run multiple instances on the same machine or just one instance. When it comes to running queries it makes no difference. To create a cluster of 3 for instance. Checkout actordb to 3 folders (you need erlang version 17+ installed). In every folder set etc/app.config rpcport value to something unique, set name of node in etc/vm.args which must also be different for every node, then list all of them in nodes.yaml. Once you have them running, run ./actordbctrl init in one of the folders. We have a tool to run actordb in a cluster automatically, but it requires Erlang knowledge to use. It is built to run tests.

dessalines commented 9 years ago

@SergejJurecko Thanks a bunch, this is really helpful, I'll be doing some testing soon, I'll let you know any issues I run into. Thanks!

biokoda / actordb

Questions after reading docs #3