apache / shardingsphere

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Apache License 2.0
19.88k stars 6.73k forks source link

Wrong output when query contains limit, group by and distinct combination #29841

Open Afsalmc opened 8 months ago

Afsalmc commented 8 months ago

Bug Report

Which version of ShardingSphere did you use?

5.4.1

Which project did you use? ShardingSphere-JDBC or ShardingSphere-Proxy?

ShardingSphere-JDBC

Expected behavior

Query returns accurate results.

Actual behavior

Actual Results, not number of rows returned depends on the limit specified after group by keyword.

Reason analyze (If you can)

If the user entered query is : SELECT COUNT(distinct Id2),Id1 C1 FROM ACCOUNT GROUP BY Id1 LIMIT 10

before querying in individual shards, the query gets converted into SELECT distinct Id2,,Id1 C1 FROM ACCOUNT LIMIT 10

hence the combined output depends on the limit value, limit value no longer has just the control over the number of rows returned but it is influencing the actual data.

Steps to reproduce the behavior, such as: SQL to execute, sharding rule configuration, when exception occur etc.

More than one Shards to reproduce the issue.

strongduanmu commented 8 months ago

Hi @Afsalmc, thank you for your feedback. Can you provide your configuration and table init sql to help us reproduce this bug?

Afsalmc commented 8 months ago

@strongduanmu

dataSources:
  db0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.microsoft.sqlserver
    jdbcUrl: jdbc:sqlserver://db1.database.windows.net;databaseName=db1;loginTimeout=300;encrypt=false;
    username: adm
  db1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.microsoft.sqlserver
    jdbcUrl: jdbc:sqlserver://db11.database.windows.net;databaseName=db11;loginTimeout=300;encrypt=false;
    username: adm
rules:
  - !SQL_FEDERATION
    sqlFederationEnabled: true
    executionPlanCache:
      initialCapacity: 1000
      maximumSize: 4000
  - !SHARDING
    tables:
      ACCOUNT:
        actualDataNodes: dbo.Account,db1.Account

I haven't used any init SQL. The schema was already defined and I just connected using shardingsphere jdbc. I also noticed that I'm only getting this weird behavior when the query contains Distinct keyword : select count(DISTINCT accountid) as c,Stamp from Account group by stamp

Not when select count(accountid) as c,Stamp from Account group by stamp

Afsalmc commented 8 months ago

@strongduanmu Do you want any other info from me ?

strongduanmu commented 8 months ago

@strongduanmu Do you want any other info from me ?

Thanks for the information. Currently, ShardingSphere's support for SQL Server is incomplete, and there are still some problems with the sharding feature. Are you interested in fixing this bug?

Afsalmc commented 8 months ago

@strongduanmu I am still learning how it works. I can reproduce this with MySQL schemas also.

strongduanmu commented 8 months ago

@strongduanmu I am still learning how it works. I can reproduce this with MySQL schemas also.

@Afsalmc Thank you for your feedback. In addition, I found that your sharding rules do not have a sharding strategy configured, which may cause problems, can you add the sharding strategy and try again?

Afsalmc commented 8 months ago

@strongduanmu Forgot to mention that I am using java API to add default sharding strategy. shardingRuleConfig.setDefaultDatabaseShardingStrategy(new StandardShardingStrategyConfiguration("Accountid", "shardingLogic"));

shardingLogic is a custom sharding Logic


public final class ShardingLogic implements StandardShardingAlgorithm<Integer> {
    private Map<Integer,String> shardMap = new HashMap<>();
    private shardMapLoader shardMapLoader;
    private AbstractConfiguration config;
    public ShardingLogic() throws ConfigurationException {
        config  = Main.getConfig();
        var shardConnection = config.getString("shardMapJdbcUrl");
        var shardUser = config.getString("shardMapUserName");
        var shardPass = config.getString("shardMapPass");
        shardMapLoader = new shardMapLoader(shardConnection,shardUser,shardPass);
        shardMap =  shardMapLoader.LoadtoJavaMap();
    }
    @Override
    public String doSharding(final Collection<String> availableTargetNames, final PreciseShardingValue<Integer> shardingValue) {
        for (String each : availableTargetNames) {
             if(!shardMap.containsKey(shardingValue.getValue()))
                 break;
             if(availableTargetNames.size() ==1){
                 return each;
             }
            var dbSource = shardMap.get(shardingValue.getValue());
             if(!each.contains(">")){
                 for (var dataSources: each.split("\\|")) {
                     if(dbSource.equals(dataSources)){
                         return each;
                     }
                 }
             }
            var shardingSchema = shardingValue.getLogicTableName().split("__",2)[0];
             var targetSchemaSet = each.split(">")[0];
             String[] targetSchemas = null != config.getString(targetSchemaSet+"Schema") ? config.getString(targetSchemaSet+"Schema").split(",") : new String[]{""};
             var isMatchingSchema = (null !=shardingSchema) && Arrays.asList(targetSchemas).contains(shardingSchema);
             if(isMatchingSchema){
                 var targetSources = each.split(">")[1].split("\\|");
                 for(var targetSource:targetSources){
                     if(dbSource.equals(targetSource))
                         return each;
                 }
             }
        }
        return null;
    }

    @Override
    public Collection<String> doSharding(final Collection<String> availableTargetNames, final RangeShardingValue<Integer> shardingValue) {
        return availableTargetNames;
    }
}
github-actions[bot] commented 6 months ago

There hasn't been any activity on this issue recently, and in order to prioritize active issues, it will be marked as stale.