apache / shardingsphere

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Apache License 2.0
19.89k stars 6.73k forks source link

Does this library support sharding tables with same name and schema in different databases #28733

Closed sivaraman-27 closed 3 months ago

sivaraman-27 commented 1 year ago

Please understand my question below

sivaraman-27 commented 1 year ago

@TeslaCN, @strongduanmu Could you please reply to my query above.

sivaraman-27 commented 1 year ago

Basically it's like dictionary sharding [Look up table which map ids to shards]

strongduanmu commented 1 year ago

I think it would be more helpful if you could provide sql examples.

sivaraman-27 commented 1 year ago

Added some info below: (The tables below are just for example representation)

Tables involved:

shard_lookup: order_id shard_no
1 1
2 1
3 2
4 2
orders: [This table is present in shard_no = 1] order_id user_id order_amount
1 100 $8
2 101 $10
order_details: [This table is present in shard_no = 1] order_id restaurant_id ordered_on delivered_on
1 8001 epoch_time epoch_time
2 8002 epoch_time epoch_time
order_ratings: [This table is present in shard_no = 1] order_id rating_value rated_on will_recommed_to_others
1 5 epoch_time Yes
2 4 epoch_time Yes

Examples:

  1. For getting amount for order_id = 1, below steps are followed
    • Query 'shard_lookup' for order_id = 1, which gives shard_1
    • Go to shard_1, query 'orders' table for order_id = 1 which will give '$8'
      1. For getting amount for order_id = 4, below steps are followed
    • Query 'shard_lookup' for order_id = 4, which gives shard_2
    • Go to shard_2, query 'orders' table which will give appropriate amount
sivaraman-27 commented 1 year ago

Added some info below: (The tables below are just for example representation)

Tables involved:

  • shard_lookup
  • orders
  • order_details
  • order_ratings

shard_lookup:

order_id shard_no 1 1 2 1 3 2 4 2 orders: [This table is present in shard_no = 1]

order_id user_id order_amount 1 100 $8 2 101 $10 order_details: [This table is present in shard_no = 1]

order_id restaurant_id ordered_on delivered_on 1 8001 epoch_time epoch_time 2 8002 epoch_time epoch_time order_ratings: [This table is present in shard_no = 1]

order_id rating_value rated_on will_recommed_to_others 1 5 epoch_time Yes 2 4 epoch_time Yes

  • The tables above are present in shard_no = 1. Assume these tables with same name and structure is present in all shards.
  • So, for querying any data from 'orders', 'order_details', 'order_ratings', first correct shard is obtained from 'shard_lookup' using order_id, then appropriate table is queried.

Examples:

  1. For getting amount for order_id = 1, below steps are followed

    • Query 'shard_lookup' for order_id = 1, which gives shard_1
    • Go to shard_1, query 'orders' table for order_id = 1 which will give '$8'
  2. For getting amount for order_id = 4, below steps are followed

    • Query 'shard_lookup' for order_id = 4, which gives shard_2
    • Go to shard_2, query 'orders' table which will give appropriate amount
  • I know I can write custom logic, but just wanted to know if apache shardingsphere/proxy will be helpful because you guys would have solved lots of common problems.

@strongduanmu I have add some information above, please checkout it out and let me know your thoughts.

sivaraman-27 commented 1 year ago

@strongduanmu So, a single query will be used everytime, only thing that changes is the shard before executing the query.

Hope I'm Clear. Awaiting your answer.

strongduanmu commented 1 year ago

orders order_details order_ratings @sivaraman-27 These three tables do not contain sharding keys?

strongduanmu commented 1 year ago

Hi @sivaraman-27, thank you for your example. I think you can use SQL Hint for sharding. For orders, order_details, order_ratings, you can specify shard_no in the way of SQL comments when querying, and include order_id in the sql condition. SQL Hint completes the routing of the table, and order_id filters the data. You can refer this doc - https://shardingsphere.apache.org/document/current/en/user-manual/common-config/sql-hint/ to use sql hint.

sivaraman-27 commented 1 year ago

/

orders order_details order_ratings @sivaraman-27 These three tables do not contain sharding keys?

@strongduanmu Nope, only 'shard_lookup' has the keys

sivaraman-27 commented 1 year ago

_Thank you for your example. I think you can use SQL Hint for sharding. For orders, order_details, order_ratings, you can specify shard_no in the way of SQL comments when querying, and include order_id in the sql condition. SQL Hint completes the routing of the table, and orderid filters the data. You can refer this doc - https://shardingsphere.apache.org/document/current/en/user-manual/common-config/sql-hint/ to use sql hint.


Hi @strongduanmu, I have a doubt, if I use 'SQL Hint', then is 'shard_no' (see highlighted text above sent by you) the corresponding datasource name specified in the 'yaml.file'?

Example: I want to know the amount for order_id = 1, then I do the below query

/* SHARDINGSPHERE_HINT: SHARDING_DATABASE_VALUE="datasource_1" */ select amount from orders where order_id = 1;

Then I specify datasource_1 as below in 'yaml.config' file (Or use Java for dynamic updations)

dataSources :
  ds 0:
  dataSourceC1assName: com.zaxxer.hikari.HikariDataSource
  driverC1assName: com.mysql.jdbc.Driver
  jdbcUr1: ***************** I specify hostname of shard_no = 1 *****************************************
  server Timezone=UTC&useSSL=fa1se&useUnicode=true&characterEncoding=UTF-8
  username: --
  password : --
strongduanmu commented 1 year ago

You should config sharding table with hint strategy. For more details, you can refer this doc - https://shardingsphere.apache.org/document/current/en/user-manual/shardingsphere-jdbc/yaml-config/rules/sharding/

sivaraman-27 commented 12 months ago

Hi @strongduanmu, I would like to know if joining two tables in different datasources is feasible with any of apache shardingsphere's tools?

Example: (Assume I have to only keep the two tables in separate datasources)

Query in one datasource works well (without any join): Query = "/ SHARDINGSPHERE_HINT: DATA_SOURCE_NAME=ds_1 / select * from orders";

What's needed: Query = "select u.name from dataSource_1.users u join dataSource_2.orders o on u.user_id = o.user_id";

sivaraman-27 commented 12 months ago

Hi @strongduanmu, I have another doubt, it seems like the only thing that apache sharding sphere handles for us is 'connection pooling'. Could you let me know any other use cases for this library (considering our specific use case).

The reason I'm asking is we can manage the connection pooling (already doing for few), and don't want to incorporate a library for such use case.

Example: Using apache sharding sphere: Query = "/ SHARDINGSPHERE_HINT: DATA_SOURCE_NAME=ds_1 / select * from orders";

Using our own mechanism: Query = select * from ds_1.orders";

strongduanmu commented 12 months ago

I would like to know if joining two tables in different datasources is feasible with any of apache shardingsphere's tools?

You can try the experimental feature - sql federation.

sqlFederation:
  sqlFederationEnabled: true
  executionPlanCache:
    initialCapacity: 2000
    maximumSize: 65535
github-actions[bot] commented 11 months ago

There hasn't been any activity on this issue recently, and in order to prioritize active issues, it will be marked as stale.