go-jet / jet

Type safe SQL builder with code generation and automatic query result data mapping
Apache License 2.0
2.23k stars 110 forks source link

[Question] How to populate nested "entities" that require self-join #252

Closed itaranto closed 11 months ago

itaranto commented 12 months ago

I have the following models:

type Components struct {
    ID uuid.UUID `sql:"primary_key"`
    // ...
}

type Vulnerabilities struct {
    ID uuid.UUID `sql:"primary_key"`
    // ...
}

type ComponentVulnerabilities struct {
    ComponentID     uuid.UUID `sql:"primary_key"`
    VulnerabilityID uuid.UUID `sql:"primary_key"`
    // ...
}

type ComponentChildren struct {
    ComponentID uuid.UUID `sql:"primary_key"`
    ChildID     uuid.UUID `sql:"primary_key"`
    // ...
}

And this is the struct where I want the data to be mapped to:

type component struct {
    model.Components
    Vulnerabilities []model.Vulnerabilities
    Children        []component
}

Notice the Children field refers to itself : component instead of model.Components because I want children to have the same data as the parent, in this case, the Vulnerabilities.

If I didn't want to populate the children's sub-relations, I could have done something like this:

type component struct {
    model.Components
    Vulnerabilities []model.Vulnerabilities
    Children        []model.Components `alias:"children"`
}

And then make a query like this:

    tableChildren := table.Components.AS("children")

    stmt := postgres.SELECT(
        table.Components.AllColumns,
        table.Vulnerabilities.AllColumns,
        tableChildren.AllColumns,
    ).FROM(
        table.Components.
            LEFT_JOIN(
                table.ComponentVulnerabilities,
                table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
            ).
            LEFT_JOIN(
                table.Vulnerabilities,
                table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
            ).
            LEFT_JOIN(
                table.ComponentChildren,
                table.ComponentChildren.ComponentID.EQ(table.Components.ID),
            ).
            LEFT_JOIN(
                tableChildren,
                tableChildren.ID.EQ(table.ComponentChildren.ChildID),
            ),
    )

In order to get the children with its sub-entities, I know that I need to do a sub-query but I'm not sure if the QRM supports this.

I tried something like this:

    tmpTableChildren := postgres.SELECT(
        table.Components.AllColumns,
        table.Vulnerabilities.AllColumns,
    ).FROM(
        table.Components.
            LEFT_JOIN(
                table.ComponentVulnerabilities,
                table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
            ).
            LEFT_JOIN(
                table.Vulnerabilities,
                table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
            ),
    ).AsTable("tmp_children")

    tmpTableChildrenID := table.Components.ID.From(tmpTable)

    stmt := postgres.SELECT(
        table.Components.AllColumns,
        table.Vulnerabilities.AllColumns,
        tmpTableChildren.AllColumns(),
    ).FROM(
        table.Components.
            LEFT_JOIN(
                table.ComponentVulnerabilities,
                table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
            ).
            LEFT_JOIN(
                table.Vulnerabilities,
                table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
            ).
            LEFT_JOIN(
                table.ComponentChildren,
                table.ComponentChildren.ComponentID.EQ(table.Components.ID),
            ).
            LEFT_JOIN(
                tmpTableChildren,
                tmpTableChildrenID.EQ(table.ComponentChildren.ChildID),
            ),
    )

But I can't make the mapper to fill the structs in the way I need it.

itaranto commented 12 months ago

Anyway, as a side-note, I tried the second approach directly into a sample DB and the performance of that query is just terrible.

I may need to evaluate a different approach about how to populate this.

houten11 commented 12 months ago

Yeah, you'll need to be careful with the number of joins. Although the number of joins is not limited with jet, the performance can suffer if the query returns a huge number of rows.
For instance, in your first query, you'll end up with a row count equal to vulnerabilities row count x children row count. If there are 10 vulnerabilities and 10 children you'll end up with 100 rows, which should be fine. But if you also want to get children's vulnerabilities, you might end up with a huge number of rows.

One possible way to do it is to have 2 queries. In the first query, you fill model.Components and Vulnerabilities and in the second Children.

type component struct {
    model.Components            //<-- first query
    Vulnerabilities []model.Vulnerabilities //<-- first query
    Children        []component     //<-- second query
}

If stmt1 and stmt2 are two queries, you'll scan:

var dest component

err := stmt1.Query(db, &dest)

err := stmt2.Query(db, &dest.Children)
itaranto commented 12 months ago

Yeah, you'll need to be careful with the number of joins. Although the number of joins is not limited with jet, the performance can suffer if the query returns a huge number of rows. For instance, in your first query, you'll end up with a row count equal to vulnerabilities row count x children row count. If there are 10 vulnerabilities and 10 children you'll end up with 100 rows, which should be fine. But if you also want to get children's vulnerabilities, you might end up with a huge number of rows.

One possible way to do it is to have 2 queries. In the first query, you fill model.Components and Vulnerabilities and in the second Children.

type component struct {
  model.Components            //<-- first query
  Vulnerabilities []model.Vulnerabilities //<-- first query
  Children        []component     //<-- second query
}

If stmt1 and stmt2 are two queries, you'll scan:

var dest component

err := stmt1.Query(db, &dest)

err := stmt2.Query(db, &dest.Children)

Thanks for the suggestion, but I cannot do the query you proposed since I need to map the results to a slice of components (I should have made that clear from the start).

Like this:

    components := []*component{}
    if err := stmt.Query(db, &components); err != nil {
        return nil, err
    }

And, if I resolve it naively, it results in the N+1 problem.

itaranto commented 12 months ago

Well, I came up with a query that returns the correct information, but I don't know how to map it using the QRM:

    stmt:= postgres.SELECT(
        table.Components.AllColumns,
        table.Vulnerabilities.AllColumns,
        tableChildren.AllColumns,
        tableChildrenVulnerabilities.AllColumns,
    ).FROM(
        table.Components.
            LEFT_JOIN(
                table.ComponentVulnerabilities,
                table.ComponentVulnerabilities.ComponentID.EQ(table.Components.ID),
            ).
            LEFT_JOIN(
                table.Vulnerabilities,
                table.Vulnerabilities.ID.EQ(table.ComponentVulnerabilities.VulnerabilityID),
            ).
            LEFT_JOIN(
                table.ComponentChildren,
                table.ComponentChildren.ComponentID.EQ(table.Components.ID),
            ).
            LEFT_JOIN(
                tableChildren,
                tableChildren.ID.EQ(table.ComponentChildren.ChildID),
            ).
            LEFT_JOIN(
                tableChildrenComponentVulnerabilities,
                tableChildrenComponentVulnerabilities.ComponentID.EQ(tableChildren.ID),
            ).
            LEFT_JOIN(
                tableChildrenVulnerabilities,
                tableChildrenVulnerabilities.ID.EQ(tableChildrenComponentVulnerabilities.VulnerabilityID),
            ),
    )

In SQL (simplified version with just IDs):

SELECT components.id AS "components.id",
     vulnerabilities.id AS "vulnerabilities.id",
     children.id AS "children.id",
     children_vulnerabilities.id AS "children_vulnerabilities.id"
FROM components
     LEFT JOIN component_vulnerabilities ON (component_vulnerabilities.component_id = components.id)
     LEFT JOIN vulnerabilities ON (vulnerabilities.id = component_vulnerabilities.vulnerability_id)
     LEFT JOIN component_children ON (component_children.component_id = components.id)
     LEFT JOIN components AS children ON (children.id = component_children.child_id)
     LEFT JOIN component_vulnerabilities AS children_component_vulnerabilities ON (children_component_vulnerabilities.component_id = children.id)
     LEFT JOIN vulnerabilities AS children_vulnerabilities ON (children_vulnerabilities.id = children_component_vulnerabilities.vulnerability_id);
houten11 commented 12 months ago

Aha, in that case your destination would need to have a separate type for children. For instance:

type component struct {
    model.Components
    Vulnerabilities []model.Vulnerabilities
    Children        []struct{
        model.Components `alias:"children"`
        Vulnerabilities  []model.Vulnerabilities `alias:"children_vulnerabilities"`
    }
}

You can't use the same component type for children because qrm needs to differentiate component and component children destination type. Also, you can't alias complex type, only model and custom model. Still, beware of the number of rows this query can return, and maybe consider pagination or some other approach.

itaranto commented 11 months ago

Aha, in that case your destination would need to have a separate type for children. For instance:

type component struct {
  model.Components
  Vulnerabilities []model.Vulnerabilities
  Children        []struct{
      model.Components `alias:"children"`
      Vulnerabilities  []model.Vulnerabilities `alias:"children_vulnerabilities"`
  }
}

You can't use the same component type for children because qrm needs to differentiate component and component children destination type. Also, you can't alias complex type, only model and custom model.

That worked! This library rocks!

Still, beware of the number of rows this query can return, and maybe consider pagination or some other approach.

Yes, I'm aware of this, thank you.

itaranto commented 11 months ago

Sorry to re-open this...

Here's the thing: I thought it worked, but using the query I shared above, it only fills the first vulnerability for each child.

For what I understand, given that my query returns something like:

 components.id | vulnerabilities.id | children.id  | children_vulnerabilities.id
---------------+--------------------+--------------+-----------------------------
 component_00  |  vulnerability_00  | component_01 |       vulnerability_11
 component_00  |  vulnerability_00  | component_01 |       vulnerability_12
 component_00  |  vulnerability_01  | component_01 |       vulnerability_11
 component_00  |  vulnerability_01  | component_01 |       vulnerability_12
 component_00  |  vulnerability_02  | component_02 |       vulnerability_21
 component_00  |  vulnerability_03  | component_02 |       vulnerability_21
 ...

then, using the following structs:

type component struct {
    model.Components
    Vulnerabilities []model.Vulnerabilities
    Children        []child
}

type child struct {
    model.Components `alias:"children"`
    Vulnerabilities  []model.Vulnerabilities `alias:"children_vulnerabilities"`
}

the QRM should populate the structs to something like:

{
  "components": [
    {
      "id": "component_00",
      "vulnerabilities": [
        {
          "id": "vulnerability_00"
        },
        {
          "id": "vulnerability_01"
        },
        {
          "id": "vulnerability_02"
        },
        {
          "id": "vulnerability_03"
        }
      ],
      "children": [
        {
          "id": "component_01",
          "vulnerabilities": [
            {
              "id": "vulnerability_11"
            },
            {
              "id": "vulnerability_12"
            }
          ]
        },
        {
          "id": "component_02",
          "vulnerabilities": [
            {
              "id": "vulnerability_21"
            }
          ]
        }
      ]
    }
  ]
}

Am I doing something wrong?

Is there a more efficient way to debug the QRM?

go-jet commented 11 months ago

Hi @itaranto , This is a bug. alias:"children_vulnerabilities" is not accounted when qrm group key is constructed. I've made a fix on bug252 branch. Give it a try. On master branch, workaround can be made by wrapping model.Vulnerabilities into new struct:

type child struct {
    model.Components `alias:"children"`
    Vulnerabilities  []struct { 
        model.Vulnerabilities `alias:"children_vulnerabilities"`
    }
}
itaranto commented 11 months ago

Well, I'm glad this is a bug then...

I tried your branch and it solved my issue in regards to only populating the first vulnerability for each children.

Now, there's another similar issue this bug252 branch solved, but instead of reproducing it in my unit tests with a dummy database, I found this in a real one.

The problem was the following: Given lots of children of a given component that have the same vulnerability, then the mapper would map all of those (duplicated) vulnerabilities into the first child and not the others.

My guess is these two issues are basically the same.

go-jet commented 11 months ago

Basically, a bug is if you have two model slices of the same type in your destination, and one slice is aliased then qrm would not populate destination correctly:

type struct Dest {
    ....
    TList []model.T 
    ....
         TList2 []model.T `alias:"t_alias"`
    ....
}
// if insetead TList2 []model.T `alias:"t_alias"` a new struct is used,
// like TList2 struct{ []model.T `alias:"t_alias"` } then it's not an issue

This usually requires, like in your case, self join and 2 N joins on the same level, which is not a common case, and probably a reason why it hasn't been detected so far.

go-jet commented 11 months ago

The problem was the following: Given lots of children of a given component that have the same vulnerability, then the mapper would map all of those (duplicated) vulnerabilities into the first child and not the others.

Could you share your destination? It would be easier to confirm.

itaranto commented 11 months ago

Could you share your destination? It would be easier to confirm.

The destination struct is the same as in my previous comment.

My point was, about two (seemingly) different issues that got solved by this fix:

go-jet commented 11 months ago

Fixed with Release v2.10.1.