Why does actual-data-nodes work without specifying physical tables explicitly?

good-134 commented 2 months ago

I have observed an unexpected behavior in ShardingSphere's table sharding configuration. My configuration doesn't explicitly specify all physical tables, yet it works correctly. I'm trying to understand why this is possible.

Here's my current configuration and sharding code:

  shardingsphere:
    datasource:
      names: ds0
      ds0:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3306/demo?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=GMT%2B8&allowPublicKeyRetrieval=true
        username: root
        password: '111111'
    rules:
      sharding:
        key-generate:
          snowflake-gen:
            type: SNOWFLAKE
        shardingAlgorithms:
          agreement-table-complex-algorithm:
            type: CLASS_BASED
            props:
              strategy: COMPLEX
              algorithmClassName: biz.demo.config.sharding.AgreementAlgorithm
        tables:
          agreement:
            actual-data-nodes: ds0.agreement
            table-strategy:
              complex:
                sharding-algorithm-name: agreement-table-complex-algorithm
                sharding-columns: id,create_time
    props:
      sql-show: true

public Collection<String> doSharding(Collection<String> availableTargetNames, ComplexKeysShardingValue<Long> complexKeysShardingValue) {

        Map<String, Range<Long>> columnNameAndRangeValuesMap = complexKeysShardingValue.getColumnNameAndRangeValuesMap();

        Map<String, Collection<Long>> columnNameAndShardingValuesMap = complexKeysShardingValue.getColumnNameAndShardingValuesMap();

        List<String> actualTarget = new ArrayList<>();
        Snowflake snowflakeInstances = Snow.getSnowflakeInstances();
        Collection<Long> idKeyColl = columnNameAndShardingValuesMap.get("id");
        String logicTableName = complexKeysShardingValue.getLogicTableName();
        if (CollectionUtils.isNotEmpty(idKeyColl)) {
            for (Long id : idKeyColl) {
                long generateDateTime = snowflakeInstances.getGenerateDateTime(id);
                int suffix = Math.abs(String.valueOf(generateDateTime).hashCode() % 4);
                String tb = logicTableName + "_" + suffix;
                actualTarget.add(tb);
            }
        }
        return actualTarget;
}

In my database, I have four physical tables: agreement_0, agreement_1, agreement_2, and agreement3. However, in my configuration, I've only specified actual-data-nodes: ds0.agreement instead of actual-data-nodes: ds0.agreement$->{0..3}.

Despite this, my sharding setup works correctly. The system successfully routes queries to the appropriate physical tables.

My questions are:

Why does this configuration work without explicitly specifying all physical tables?
Is this behavior intentional or a side effect of how ShardingSphere processes configurations?
Are there any potential risks or downsides to using this simplified configuration in a production environment?
Is there any documentation that explains this behavior?

Is there some default behavior in ShardingSphere that I'm not aware of, or is it related to my custom sharding algorithm? Thank you.

linghengqian commented 2 months ago

The locking issue was just a mistake of hand. Can you debug the variables actualTarget and availableTargetNames? That sounds kind of weird.

good-134 commented 2 months ago

I have debugged the variables availableTargetNames and actualTarget as you suggested. availableTargetNames contains the logical table name agreement. actualTarget contains a single physical table name agreement_2. From this, it seems that my custom sharding algorithm is correctly mapping the logical table to the appropriate physical table. However, my original question was about why my configuration works even though I didn't specify the physical table names explicitly in the actual-data-nodes property. Could it be that ShardingSphere relies on the sharding algorithm to resolve the physical table names from the logical table name, even when the actual-data-nodes doesn't explicitly list all the tables? Is this behavior expected?

Thank you for your help.

linghengqian commented 2 months ago

Is this behavior expected?

Currently, this feature does exist. In fact, the unit tests also rely on this feature.
Feel free to check https://github.com/apache/shardingsphere/tree/master/test/native . In unit test files like https://github.com/apache/shardingsphere/blob/master/test/native/src/test/resources/test-native/yaml/features/sharding.yaml , actualDataNodes is empty. The actual table of target routes is generated by the logic function of the custom algorithm class.
This feature can be traced all the way back to the ShardingSpehre 5.1.0 example.
But this feature is really humorous, because ShardingSphere Proxy obviously should not allow this. This is something that only ShardingSphere JDBC has.
@terrymanu What do you think? I will say that I haven’t tested whether ShardingSphere Proxy can do the same with DistSQL.

apache / shardingsphere

Why does actual-data-nodes work without specifying physical tables explicitly? #32591