Open bonesmoses opened 9 years ago
Which use-case are you more interested in?
pg_shard
works well for applications that might otherwise be using e.g. mongoDB. From their documentation on the matter:
When a single write operation modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic and other operations may interleave.
Because pg_shard
is an extension you may selectively apply to certain tables, it can allow users to maintain an all-PostgreSQL datastore, using certain tables with your usual ORMs, etc., and interacting with others in a manner more similar to e.g. mongoDB.
Though transactions are one way of handling bulk ingest, this could also be addressed with better COPY
support, which is already in our backlog as #62. We also recently closed #61, though that issue was more about addressing usability than performance.
It's worth mentioning that a single PostgreSQL box does not scale well to many writers—no matter how many cores—due to existing issues with contention around buffer locking, etc. But because pg_shard
is farming out its INSERT
s to many smaller PostgreSQL instances, it promises the possibility of sidestepping those scaling limitations.
Ultimately, pg_shard
is unlikely to become a product you can just drop in behind, say, Rails and have it "just work" in the sense that Rails is entirely agnostic to the sharding but can continue behaving as though it's talking to a single PostgreSQL box. But we do have our eye on certain targeted use cases and scaling performance going forward, so these bits of feedback from the community are great to have as we prioritize the backlog.
The use case that has the most utility is a multi-statement transaction targeting the master and cascading to the child shards.
However, the primary problem is that pg_shard
obfuscates its hashing and distribution algorithms (as it should, really) so there's no way to reliably target specific shards. That makes it impossible to even emulate transaction support at the application level. Beyond that though, is that the mere presence of a transaction context automatically disables the module.
Anyone using a connection with autocommit disabled won't be compatible with any pg_shard
table in its current form. Even a single select statement wrapped in a transaction will fail. Or more likely, several queries, one of which targets a pg_shard
table, would still be impossible.
django and peewee use transactions by default as well and i am not even sure yet you are able to disable this behavior as it may be even a psycopg2 or psycopg2cffi thing. And i just rewrote the hole stack for postgres because mongodb sucks BADLY if you go over 100m rows.
I think it would be easier to support multiple statements in 1 sql-string all targeting the same shard (all of them specifying the sharding column) at first. (my usecase)
@jasonmp85 I have a use case for Multi-statement transactions targeting a single shard: The event store of event sourcing (ES) design pattern. In this use case you have two tables: events table and optimistic-locks table that it's used to insert events transactionally in table events. ES on sql is simple, but it doesn't scale. ES on nosql is hard, but it scales. I was hoping pg_shard to have the best of both worlds, but it's more like nosql in this case.
I think voltdb is the only solution at the moment that scales like nosql and it is simple to implement.
More info about ES+CQRS:
Note in voltdb you have transactions only in 1 db-call (ex sending multiple sql-statements in 1 string) and not several strings like you can do with plain postgresql/mysql. Meaning, session is autocommit. The presentation had too many buzzwords.
@ddorian You are right about voltdb. Let's say it's possible to implement the optimistic lock in PL/pgSQL and pg_shard. I still have the problem that events_123 and optimistic_lock_123 tables are not on the same server. :-(
Currently when attempting to execute a query within a transaction, the following output is produced:
Since transactions are not supported, and only single
INSERTS
are allowed, this makes data import incredibly slow. Further, this makespg_shard
completely unsuitable for use within a production environment where data integrity is important.