Data Access Layer - Githubissues

jeevatkm commented 6 years ago

aah's goal of Data Access Layer as follows-

Support RDBMS and NoSQL data sources
Seamless usage of multiple data sources (be it RDMBS or NoSQL) and having one as default data source
Data source aware model/entity, so that aah user doesn't have to deal with hard part targeting different data source.

I will do my analysis first then planned out implementation part.

Data Access scope is vast, I'm gonna go by step by step

[ ] Create foundation
[ ] Mapping/Binding Result Sets into struct
[ ] CURD style operations
[ ] Persistence events/hooks
[ ] Multiple Database support (to begin with MySQL & Postgres)
[ ] Multiple DataSource support and connection management
[ ] Prepared Statements with Named Arguments binding from struct, map
[ ] Native query execution
[ ] Query Cache in-memory to begin with

Upcoming List

Database tables to struct Model generation type safe
Datbase Migration (up & down) with audit-trail
Transactions
Data Source aware models
Debug logging (first release would have basic level)
Filters/Processors
etc

broklyngagah commented 6 years ago

I use gorm in all of my RDBMS projects. +1 for gorm..

jeevatkm commented 6 years ago

@broklyngagah Thank you for your inputs.

I'm currently in the process of figuring out and provide meaningful design and usage - As you know "Modern application uses RDBMS as well as NoSQL".

blagodus commented 6 years ago

@jeevatkm you can look at these packages for inspiration(integration):

Do you plan to build new package or just integrate existing?

jeevatkm commented 6 years ago

@blagodus Thank you for your inputs, I will surely look into it. It seems in the initial write up my goals didn't come out properly. I have just update it.

Answer to your question, I have not yet decided the approach whether new package or integrating existing. I will be doing my analysis inline with goals of this implementation.

Once analysis is done, I would publish my direction on this thread.

AugustHell commented 6 years ago

Seamless usage of multiple data sources (be it RDMBS or NoSQL) and having one as default data source

You might have a look on the awesome CockroachDB. It is written in go, incredible fast, made for scaleability and it can be connected thru postgres-adapters. JSON types makes it interesting for NoSQL users too.

jeevatkm commented 6 years ago

@AugustHell Thank you for your inputs.

AugustHell commented 6 years ago

The database layer is a big topic in my eyes. Most frameworks struggle on that part or even touch it. In the go world I haven't find yet a really good one. Some thoughts:

Multiple Connection Handling

As write operations are the heavy impacts for any database, a solution for high traffic sites is to have a pure read and a read/write connection to two different servers that are handling the sync on their own. Another real life example are geo located connections to serve data from the nearest db server to keep latency low.

Permanent and Temporary Connections

The number of db connections is most time limited, so a permanent connection seems great, but is not in any case. A developer should have the freedom of choice to use what he needs.

Database Modeling

I really like to have a structured model for each db table I use. In projects you need a lot of tables, you get quick bored creating them. The helping methods I know of, are creating thru command line scripts, auto migration like in gorm, or extracting from the database's "SHOW TABLE". While some databases have great modeling tools, like eg MySQL Workbench, others don't. Hence offering only one way of modeling is limiting it's use.

Topic based Migrations

When it comes to migrations the most seen solution is a time based one with up/down scripts. In my opinion, this is not really practicable as when developing in a team on a bigger project not every one is focused on the same tables, Soon there are a lot of up and downs you have to do if you want to revert one manipulation. A table based versioning reflecting the table relations would be nicer.

Multiple Query Cache

While most databases have a query cache, table joins are sometimes not the most perfomant solution. So you hack your data needs with serveral querys together. Putting that in a cache can speed up the app immense. So if it comes to a query builder, some kind of transaction block which aim is not the data integrity but the caching of the output would be nice to have.

That's not a complete list, just what came to my mind at the moment.

jeevatkm commented 6 years ago

Update: My homework and analysis are progressing well on data access layer, however I need further time to start foundation. So pulling this one out from v0.11.0 and making v0.11.0 Milestone release.

Thank you very much for the support and understanding.

jeevatkm commented 6 years ago

I have updated with plan and pulled out from initial and adding it here for reference:

Following is initial leg work on RDBMS findings-

I have done my homework (read documentation, source code) around many data access layer libraries exists now. It turns out following is the viable candidates for integration as data access layer with aah.

github.com/jmoiron/sqlx - https://github.com/jmoiron/sqlx - minimal one
upper.io/db.v3 - https://upper.io/db.v3 - mostly needed features
github.com/jinzhu/gorm - http://gorm.io - mostly needed features

AugustHell commented 5 years ago

Over the past week I researched more about this topic, as I really need a better solution that are yet out there. My focus is a sql database (either cockroach with postgress driver or percona server with mysql driver), while keystores in memory like redis or bolt db are still in mind for heavy perfomance uses.

So here are some more thoughs for the data acess layer, i came accross:

Transactions and prepared statements can be a pitfall Both depend on the connection they get executed and both are involving extra round trips to the database. However transactions are great for data integrety operations and absolutely need for them. Prepared statements are only useful for repeated queries on the same connection, if the database and the db driver supports them natively. Reason is, that the prepared statement has 3 round trips to the database. On the other hand repeated queries can get boosted with more db connections out of the pool, or by avoiding them by building queries that give back all the needed data at once. In a lot of cases, it's better to do the preparation of paramentarized queries in the program and send it to the db, instead of letting the database doing it.

Row Identifiers and unique id's Using a sql db's auto increment id can lead to problems when backing up and restoring the data, as well as for concurency. A better way to identify entities/rows is by letting the program generate the uuid and past it as string. This avoids having a maximum number of id (in opposite of int32, which some db's still use). Even building id's by the program, leads to independency of the choosen data persistence solution. An approach for generating could be: https://github.com/sony/sonyflake which is based on time and machine id. As machines can crash and app portability is better, I would prefer using an application id instead of a machine id.

Controller, model and data access layer dependencies Whatever persistence solution is used now, might not be the best in the near future. We've seen a lot of new approaches with kv-stores, nosql, cloud services, sharding and so on in the last years. Developing an application with a framework should not depend on the persistence used at initial design. Decoupling the data access layer from controllers and models makes it easier to test too. To get independent of the data source, the database/sql package defines interfaces and the data access layer should do it too. The base idea of that is reprensented in this: https://husio.github.io/blog/post/accessing-data-in-go/

Perfomance wise Depending on the applications use case, access to the data can get heavily time consuming and is often the bottleneck of the app performance wise. While using a ORM like gorm feels great in developing, it's reflection and query builder is a real performance hit leading to multiply the data access times by factors. Another approach is using code generation at build time like https://github.com/volatiletech/sqlboiler or https://github.com/gnormal/gnorm. Well, the first time I've seen that, it didn't look very comfortable for me, but as more as I learn about whats going up behind what influence it has on being productive as well as writing a perfomant application, my interest growed. As aah is already generating code, using parts of that idea might be worth to think of. For instance in data access controllers a query builder can help the developer, while the queries itselves are composed at buildtime, instead of building and executing them at runtime.

jeevatkm commented 5 years ago

@AugustHell Thank you for your inputs. I will utilize your insights while design and implementation.

go-aah / aah

Data Access Layer #161

Multiple Connection Handling

Permanent and Temporary Connections

Database Modeling

Topic based Migrations

Multiple Query Cache