cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.85k stars 1.77k forks source link

Unable to get total number of records using "total":true attribute for MSSQL database. #7446

Closed niravjprabtani closed 1 month ago

niravjprabtani commented 10 months ago

Describe the bug We are currently implementing server-side pagination in a web application, utilizing the "total": true attribute to obtain an exact count of records based on given conditions, irrespective of limit and offset. This helps in acquiring insights into the total number of pages for the application. However, when executing the Cube.js API, an error occurs.

{
    "error": "Error: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.",
    "requestId": "dd9ab55d-3e9f-4db8-ab96-7526a0c7eeb8-span-1"

Investigation and Findings Upon switching the database from MSSQL to Snowflake, it became evident that the issue pertains specifically to MSSQL. Further investigation revealed that Cube.js internally executes an identical query without limit and offset to obtain the total record count for the API request. Cube.js wraps the original query within a "select count(*) from" statement. This method fails in MSSQL due to its lack of support for an order statement within this clause.

To Reproduce Steps to reproduce the behavior:

  1. Use MSSQL as a data source
  2. Keep "total":true in the JSON request for any API
  3. Execute load endpoint for Cube.js API
  4. See error

Expected behavior It should give results along with the total number of records.

Minimally reproducible Cube Schema You can use selects without tables in order to achieve that as follows.

cube(`Orders`, {
  sql: `
  select 1 as id, 100 as amount, 'new' status
  UNION ALL
  select 2 as id, 200 as amount, 'new' status
  UNION ALL
  select 3 as id, 300 as amount, 'processed' status
  UNION ALL
  select 4 as id, 500 as amount, 'processed' status
  UNION ALL
  select 5 as id, 600 as amount, 'shipped' status
  `,
  measures: {
    count: {
      type: `count`,
    },
    totalAmount: {
      sql: `amount`,
      type: `sum`,
    },
    toRemove: {
      type: `count`,
    },
  },
  dimensions: {
    status: {
      sql: `status`,
      type: `string`,
    },
  },
});

Please invoke below API request to get the API response

Sample request URL : http://localhost:4000/cubejs-api/v1/load Query param :

{
    "dimensions": [
        "Orders.status"
    ],
    "timeDimensions": [],
    "total": true,
    "limit":10,
    "offset":1
}

This should provide list of status along with total number of records.

Version: Here is my sample docker-compose.xml

version: '2.2'
services:
  cube:
    image: cubejs/cube:latest
    ports:
      - 4000:4000
      - 15432:15432
    environment:
      - CUBEJS_DEV_MODE=true
      - CUBEJS_DB_HOST=*******
      - CUBEJS_DB_PORT=1433
      - CUBEJS_DB_NAME=*******
      - CUBEJS_DB_USER=*******
      - CUBEJS_DB_PASS=*******
      - CUBEJS_DB_TYPE=mssql
      - CUBEJS_EXTERNAL_DEFAULT=true
      - CUBEJS_SCHEDULED_REFRESH_DEFAULT=true
      - CUBEJS_SCHEMA_PATH=model
      - CUBEJS_DB_SSL=true
      - CUBEJS_JWK_URL=*******
    volumes:
      - .:/cube/conf
      - *******

Additional context Investigation and Findings: Upon switching the database from MSSQL to Snowflake, it became evident that the issue pertains specifically to MSSQL. Further investigation revealed that Cube.js internally executes an identical query without limit and offset to obtain the total record count for the API request. Cube.js wraps the original query within a "select count(*) from" statement. This method fails in MSSQL due to its lack of support for an order statement within this clause.

Queries:

  1. Cube.js Formed Query for Total Records:
    select count(*) "total_count" from (SELECT COLUMN1, COUNT(COLUMN1) FROM TABLENAME GROUP BY 1 ORDER BY 1 ASC) "original_query"
  2. Actual Query Formed by Cube.js to Retrieve Results:
    SELECT COLUMN1, COUNT(COLUMN1) FROM TABLENAME GROUP BY 1 ORDER BY 1 ASC OFFSET 1 ROWS FETCH NEXT 10 ROWS ONLY

    The problem lies in query number 1, as the "order by" clause is within the "select count(*) from" statement, which is not supported in MSSQL. We seek guidance on how to address this issue to achieve the intended functionality. Despite attempting various solutions, using the "total": true attribute in the JSON request consistently results in an exception.

Any assistance in resolving this matter would be greatly appreciated.

github-actions[bot] commented 10 months ago

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

igorlukanin commented 10 months ago

@paveltiunov It looks like this might be fixed at the MSSQL driver level—or at a more general level, so that order will be stripped from queries used to calculate totals. Might be a single-line fix here: https://github.com/cube-js/cube/blob/0723bd5421619001aca54b0331724cb6d23fa6dd/packages/cubejs-api-gateway/src/gateway.ts#L1413-L1416

rdwoodring commented 6 months ago

We're running into this as well. I'm taking a look at fixing it with @Nick-PC.

@igorlukanin so far we haven't had any luck making the update in cube/packages/cubejs-api-gateway/src/gateway.ts.

We have tried setting normalizedTotal.order = null and also delete normalizedTotal.order to no avail. We'll continue poking at it, but any guidance would be greatly appreciated.

rdwoodring commented 6 months ago

@igorlukanin I think we found the proper fix in the above pull request

igorlukanin commented 1 month ago

Thanks for your contribution @rdwoodring 🙌

THe fix will be released in v0.35.81 soon.