cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.93k stars 1.78k forks source link

Incorrect join order or can't find path two join #6390

Open ezequielp opened 1 year ago

ezequielp commented 1 year ago

Describe the bug Wrong SQL query generated or can't find path between two cubes.

To Reproduce Steps to reproduce the behavior:

  1. Run query:
    {
    "dimensions": [
    "B.cName",
    "E.name"
    ]
    }
  2. Get error: Can't find join path to join 'C', 'E'
  3. Replace sql: `${C.name}`, with sql: `${CUBE.C.name}`,
  4. Repeat the query. Now, it runs correctly
  5. Run query:
    {
    "dimensions": [
    "B.cName",
    "A.id"
    ]
    }
  6. The error will now be:
    ER_BAD_FIELD_ERROR: Unknown column 'b.id' in 'on clause'

    This comes from an incorrect SQL statement that is joining tables in the wrong order:

SELECT
  `c`.name `b__c_name`,
  `a`.id `a__id`
FROM
  (
    -- ...
  ) AS `a`
  LEFT JOIN (
    -- ...
  ) AS `a1` ON `a`.id = `a1`.a_id
  LEFT JOIN (
    -- ...
  ) AS `c` ON `b`.id = `c`.b_id -- this join should be after the next one
  LEFT JOIN (
    -- ...
  ) AS `b` ON `a1`.b_id = `b`.id
GROUP BY
  1,
  2
ORDER BY
  1 ASC

Expected behavior For the first schema, cube should be able to find the path to join B to E (which is through A). Alternatively, for the second version of the schema, cube should be able to generate the right SQL query.

Minimally reproducible Cube Schema

cube("A", {
  sql: `SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3`,
  joins: {
    A1: {
      relationship: "hasMany",
      sql: `${CUBE}.id = ${A1}.a_id`,
    },
    D: {
      relationship: "belongsTo",
      sql: `${CUBE}.id = ${D}.a_id`,
    }
  },
  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
    },
    dName: {
      sql: `${CUBE.D.name}`,
      type: `string`,
    }
  }
});

cube("A1", {
  sql: `SELECT 1 AS id, 1 AS a_id, 1 AS b_id UNION ALL SELECT 2, 1, 1 UNION ALL SELECT 3, 2, 2`,
  joins: {
    B: {
      relationship: "belongsTo",
      sql: `${CUBE}.b_id = ${B}.id`,
    }
  }
});

cube("B", {
  sql: `SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3`,
  joins: {
    C: {
      relationship: "hasMany",
      sql: `${CUBE}.id = ${C}.b_id`,
    },
    E: {
      relationship: "hasMany",
      sql: `${CUBE}.id = ${E}.b_id`,
    }
  },
  dimensions: {
    cName: {
      // Using C.name here makes query cName and E.name work
      // But it makes A.dName with C.cName fail
      sql: `${CUBE.C.name}`,
      type: `string`
    },
    id: {
      sql: `id`,
      type: `number`,
    }
  }
});

cube("C", {
  sql: `SELECT 1 AS id, 1 AS b_id, "c1" AS name UNION ALL SELECT 2, 2, "c2"`,
  dimensions: {
    name: {
      sql: `name`,
      type: `string`,
      shown: false,
    }
  }
});

cube("D", {
  sql: `SELECT 1 AS id, 1 AS a_id, "d1" AS name UNION ALL SELECT 2, 2, "d2"`,
  dimensions: {
    name: {
      sql: `name`,
      type: `string`,
    }
  }
});

cube("E", {
  sql: `SELECT 1 AS id, 1 AS b_id, "e1" AS name UNION ALL SELECT 2, 2, "e2"`,
  dimensions: {
    name: {
      sql: `name`,
      type: `string`,
    }
  }
});

Version: [e.g. 0.32.16]

Additional context This problem wasn't happening on v0.30

paveltiunov commented 1 year ago

@ezequielp This is correct behavior. Please see https://cube.dev/docs/schema/fundamentals/joins#directions-of-joins-transitive-join-pitfalls. You might want to introduce joins on both sides of a relationship and use views to provide explicit direction of joins.

ezequielpozzotromzo commented 1 year ago

@paveltiunov but, cName is defined on B, and B is joined to E, so the join could be correctly determined from the query, right? It is not as I was explicitly requesting C.Name and E.Name, I'm requesting B.cName and E.Name...

Also, what about the second part of the problem?