apache / shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Apache License 2.0
19.94k stars 6.74k forks source link

Why does actual-data-nodes work without specifying physical tables explicitly? #32591

Closed good-134 closed 1 month ago

good-134 commented 2 months ago

I have observed an unexpected behavior in ShardingSphere's table sharding configuration. My configuration doesn't explicitly specify all physical tables, yet it works correctly. I'm trying to understand why this is possible.

Here's my current configuration and sharding code:

  shardingsphere:
    datasource:
      names: ds0
      ds0:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3306/demo?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=GMT%2B8&allowPublicKeyRetrieval=true
        username: root
        password: '111111'
    rules:
      sharding:
        key-generate:
          snowflake-gen:
            type: SNOWFLAKE
        shardingAlgorithms:
          agreement-table-complex-algorithm:
            type: CLASS_BASED
            props:
              strategy: COMPLEX
              algorithmClassName: biz.demo.config.sharding.AgreementAlgorithm
        tables:
          agreement:
            actual-data-nodes: ds0.agreement
            table-strategy:
              complex:
                sharding-algorithm-name: agreement-table-complex-algorithm
                sharding-columns: id,create_time
    props:
      sql-show: true
public Collection<String> doSharding(Collection<String> availableTargetNames, ComplexKeysShardingValue<Long> complexKeysShardingValue) {

        Map<String, Range<Long>> columnNameAndRangeValuesMap = complexKeysShardingValue.getColumnNameAndRangeValuesMap();

        Map<String, Collection<Long>> columnNameAndShardingValuesMap = complexKeysShardingValue.getColumnNameAndShardingValuesMap();

        List<String> actualTarget = new ArrayList<>();
        Snowflake snowflakeInstances = Snow.getSnowflakeInstances();
        Collection<Long> idKeyColl = columnNameAndShardingValuesMap.get("id");
        String logicTableName = complexKeysShardingValue.getLogicTableName();
        if (CollectionUtils.isNotEmpty(idKeyColl)) {
            for (Long id : idKeyColl) {
                long generateDateTime = snowflakeInstances.getGenerateDateTime(id);
                int suffix = Math.abs(String.valueOf(generateDateTime).hashCode() % 4);
                String tb = logicTableName + "_" + suffix;
                actualTarget.add(tb);
            }
        }
        return actualTarget;
}

In my database, I have four physical tables: agreement_0, agreement_1, agreement_2, and agreement3. However, in my configuration, I've only specified actual-data-nodes: ds0.agreement instead of actual-data-nodes: ds0.agreement$->{0..3}.

Despite this, my sharding setup works correctly. The system successfully routes queries to the appropriate physical tables.

My questions are:

  1. Why does this configuration work without explicitly specifying all physical tables?
  2. Is this behavior intentional or a side effect of how ShardingSphere processes configurations?
  3. Are there any potential risks or downsides to using this simplified configuration in a production environment?
  4. Is there any documentation that explains this behavior?

Is there some default behavior in ShardingSphere that I'm not aware of, or is it related to my custom sharding algorithm? Thank you.

linghengqian commented 2 months ago

The locking issue was just a mistake of hand. Can you debug the variables actualTarget and availableTargetNames? That sounds kind of weird.

good-134 commented 2 months ago

image

image

I have debugged the variables availableTargetNames and actualTarget as you suggested. availableTargetNames contains the logical table name agreement. actualTarget contains a single physical table name agreement_2. From this, it seems that my custom sharding algorithm is correctly mapping the logical table to the appropriate physical table. However, my original question was about why my configuration works even though I didn't specify the physical table names explicitly in the actual-data-nodes property. Could it be that ShardingSphere relies on the sharding algorithm to resolve the physical table names from the logical table name, even when the actual-data-nodes doesn't explicitly list all the tables? Is this behavior expected?

Thank you for your help.

linghengqian commented 2 months ago

Is this behavior expected?