go-jet / jet

Type safe SQL builder with code generation and automatic query result data mapping
Apache License 2.0
2.69k stars 122 forks source link

Please add a PGX adapter #381

Open veqryn opened 2 months ago

veqryn commented 2 months ago

Is your feature request related to a problem? Please describe. Right now the default for Jet is to let postgres arrays get generated into golang strings. You can override the generated types, but then you'd also have to implement your own scan/value functions too. Using PGX directly (github.com/jackc/pgx/v5) lets us do things like scanning postgres arrays directly to a slice ([]int64), and many other great features. Unfortunately, Jet only accepts the standard library database/sql interface, which forces us to use github.com/jackc/pgx/v5/stdlib

Describe the solution you'd like Other scanning libraries such as scany and ksql provide an adapter that lets their library use both the standard library interface and also PGX directly. It would be great if Jet provided similar.

go-jet commented 2 months ago

Right now the default for Jet is to let postgres arrays get generated into golang strings. You can override the generated types, but then you'd also have to implement your own scan/value functions too.

Check https://github.com/go-jet/jet/pull/380, this PR should add full array support.

Other scanning libraries such as scany and ksql provide an adapter that lets their library use both the standard library interface and also PGX directly.

I don't see how it is possible to have adapter that converts pgx to standard database/sql calls. For instance standard database connection has this method:

QueryContext(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error)

All sql.Rows fields are private and there is no public constructor. Thus no adapter is possible. The only way is to add a new set of methods just for pgx.

Also, note that pgx does provide standard database interface - "github.com/jackc/pgx/v4/stdlib".

veqryn commented 2 months ago

You'd have to look at scany and ksql to see how they do it. It might not be an adapter, it might be a copy of the implementation or something else.

safaci2000 commented 2 months ago

At least for scany, they simply re-implement the entire service, so it's duplicated to some degree. They implement their own interface Rows{} and then have an implementation using sql.Rows and another one using pgx.Rows.

The interface itself is generic and has no reference to sql.Rows, which I don't believe is the case for Jet.

qrm.Queryable return sql.Rows, which would have to be modified to be a more generic object/iterator as a response rather than sql.db

DevX86 commented 2 months ago

@go-jet Is this possible in any capacity? We need it to get rid of database/sql.

DevX86 commented 2 months ago

Scany seem the way to go but I love this library and want to see it flourish.

houten11 commented 2 months ago

@DevX86 Why do you want to get rid of database/sql?

DevX86 commented 2 months ago

Massive speed improvements, even selecting 15,000 records I saw a 3x improvement due to the binary transfer and native bindings.

DevX86 commented 2 months ago

I'm trying to refactor into scany, my friends we were spoiled by go-jet. I can't live without it, especially when it comes to complex queries. @go-jet @houten11 Is there honestly nothing we can do for native pgx? (no database/sql)

body20002 commented 2 months ago

You actually can use them together, use go-jet to generate the sql and pgx to execute it Here's a simple example

selectStmt := SELECT(Table.AllColumns).FROM(Table).WHERE(Table.ID.EQ(1))).LIMIT(1)

func query[T any](db *pgxpool.Pool, ctx context.Context, stmt Statement) (*T, error) {
    query, args := stmt.Sql()

    row, _ := db.Query(ctx, query, args...)
    data, err := pgx.CollectOneRow(row, pgx.RowToAddrOfStructByPos[T])

    return data, err
}

# and just use it like this
data, err := query[Table](pool, ctx, selectStmt)
DevX86 commented 2 months ago

@body20002

I need something for complex structs like

type A struct {
    B struct {
        C
        D E `db:"F"`
        G H `db:"I"`
        J []struct {
            K
            L struct {
                M
                N struct {
                    O `alias:"P"`
                    Q []struct {
                        R `alias:"S"`
                    }
                }
            } `db:"T"`
            U []V `db:"W"`
        } `db:"X"`
        Y Z  `db:"AA"`
        AB AC `db:"AD"`
    }
}

Worked on something like that for hours last night combining scany with go jet, I don't like the results. I flatten the struct then parse it like this.

var resultViews []views.ResultView
for _, tempGroup := range groupedTemps {
    firstTemp := tempGroup[0]

    // Extract embedded structs
    modelA := utils.ExtractEmbeddedStruct[models.ModelA](firstTemp)
    modelB := utils.ExtractEmbeddedStruct[models.ModelB](firstTemp)
    modelC := utils.ExtractEmbeddedStruct[models.ModelC](firstTemp)
    modelD := utils.ExtractEmbeddedStruct[models.ModelD](firstTemp)
    modelE := utils.ExtractEmbeddedStruct[models.ModelE](firstTemp).Resolve()

    // Collect items and build views
    modelFItems := utils.ExtractEmbeddedSlice[Temp, models.ModelF](tempGroup)
    modelGItems := utils.ExtractEmbeddedSlice[Temp, models.ModelG](tempGroup)
    modelHItems := utils.ExtractEmbeddedSlice[Temp, models.ModelH](tempGroup)
    modelIItems := utils.ExtractEmbeddedSlice[Temp, models.ModelI](tempGroup)

    // Continue processing with the extracted data
    // For example, building a result view:
    resultView := views.NewResultView(
        &modelA,
        modelFItems,
        &modelB,
        &modelC,
        modelE,
        // ... other parameters
    )

    resultViews = append(resultViews, *resultView)
}

And this which has a struct plus slice, but we have structs into slices into structs etc. So this wont scale.

type ProjectData struct {
    models.Project       `db:"projects"`
    ProjectRecords       []models.ProjectRecord `db:"project_records" json:"project_records"`
}
func (r *ResearchRepository) GetFilteredProjectData(ctx context.Context, researcherID uuid.UUID, projectID *uuid.UUID) ([]views.ProjectData, error) {
    ctx, span := researchTracer.Start(ctx, "ResearchRepository.GetFilteredProjectData")
    defer span.End()

    // SQL statement builder
    stmt := sqlbuilder.
        SELECT(
            projectTable.AllColumns,
            projectRecordTable.AllColumns).
        FROM(researcherTable.
            INNER_JOIN(projectTable, projectTable.ResearcherID.EQ(researcherTable.ID)).
            INNER_JOIN(projectDetailTable, projectDetailTable.ProjectID.EQ(projectTable.ID)).
            INNER_JOIN(projectTable, projectDetailTable.ProjectID.EQ(projectTable.ID)).
            INNER_JOIN(projectRecordTable, projectRecordTable.ProjectID.EQ(projectTable.ID))).
        WHERE(researcherTable.ID.EQ(sqlbuilder.UUID(researcherID)))

    fmt.Println(stmt.DebugSql())

    if projectID != nil {
        stmt = stmt.WHERE(projectTable.ID.EQ(sqlbuilder.UUID(*projectID)))
    }

    // Define a temporary structure to hold results
    type Temp struct {
        models.Project       `db:"projects"`
        models.ProjectRecord `db:"project_records"`
    }

    var temps []Temp
    if err := pgxscan.Select(ctx, r.db, &temps, stmt.DebugSql()); err != nil {
        return nil, utils.SpanErr(span, err)
    }

    groupedTemps, err := utils.GroupByID(temps)
    if err != nil {
        return nil, err
    }

    var projectDataList []views.ProjectData
    for _, tempGroup := range groupedTemps {
        firstTemp := tempGroup[0]

        projectData := views.ProjectData{
            Project:        utils.ExtractEmbeddedStruct[models.Project](firstTemp),
            ProjectRecords: utils.ExtractEmbeddedSlice[Temp, models.ProjectRecord](tempGroup),
        }

        projectDataList = append(projectDataList, projectData)
    }

    return projectDataList, nil
}
DevX86 commented 2 months ago

@go-jet For reference, scany can't handle maps, or nested structs, only flat structs and flat slices.

dropwhile commented 2 months ago

@DevX86 Do you really need scany? As @body20002 pointed out in their example, since v5 pgx has nice support for scanning on its own (CollectRows, CollectOneRow, RowToStructByName, RowToStructByNameLax, etc). reference: https://ectobit.com/blog/pgx-v5-3/

I used to use scany, and just converted to using plain pgx after v5 came out.

body20002 commented 2 months ago

@DevX86 I haven't tried it on complex queries and certainty not a query that complex, but It should work with nested structs given the struct is correct or corresponds to the query return values

I think pgx docs states so

DevX86 commented 2 months ago

@dropwhile @body20002 Hate to ping you guys but I'm in the middle of a major refactor of our database layer. As long as it's pgx native I don't mind which I use (or something new) my main concern are these complex queries. Can you post an example of a nested struct with slices on it?

go-jet commented 2 months ago

@DevX86 I'll look into `pgx' when I find some time. My intuition tells me that even if pgx is 3x faster than pq with database/sql in a real production environment, it will never make a significant difference. Because latency between user and server and server and database is measured in ms, and scanning in go structures is measured in ns. For example, if a scan takes 200 ns or 600 ns, it doesn't matter that much when the entire response to the user takes more than 10 ms (~200,000 times the difference).

Now, this is assuming that the database returns a reasonable amount of rows. What is a reasonable amount of rows depends on each setup, but a rough estimate is no more than 1000, preferably less than 100, for no more than 10 columns. If your queries return 15000 rows, it will take time to group that into the result regardless of the database driver used. By joining multiple tables in a single query, you can easily end up with a huge number of rows, in which case you'll need to consider pagination or even split the query into two or more.

DevX86 commented 2 months ago

@go-jet Thanks for the response, my normal queries including server response time are 2-3x faster. (Some around 10-30% but those add up) I used a 15k rows as an initial test. The joins are for our deeply nested data types, otherwise it ends up being a ton of requsets to the backend and constructing nested object out of many flat ones.

DevX86 commented 2 months ago

Also we get some cool things like better connection pooling. I can't decide if I should wait on the ideas ahead or continue with the refactor using the odd bridge I built, it's going to take a lot of man hours and I'm not sure about the maintainablity for other developers. For integration simplicity I also had to take the fields directly off the generated struct sso that's another pitfall to make this work 'seamlessly' with scany. I'm sure there's a way to patch it but I've put much headache and many hours into this, I'm just glad it's working at all.

We have ~1,000 queries in go-jet. Since day 1 of discovery I've loved this library and it's one of the few allowed into the codebase. If it's feasible please make this happen if you can @go-jet

DevX86 commented 2 months ago

I guess the bridge isn't horrendous with some renaming.

And changes like this for usage consistency.

// ExtractStruct extracts the first struct of type K from a slice of T.
func ExtractStruct[T any, K any](items []T) K {
    if len(items) == 0 {
        var zero K
        return zero
    }

    // Extract the first item
    firstItem := items[0]

    // Use reflection to extract the embedded struct of type K
    return ExtractEmbeddedStruct[K](firstItem)
}

The main thing I'm seeing in terms of the bridge implementation difficulty, is the mental coalescing of flat entities into a struct rather than thinking in terms of the end goal (here's the struct I need) . If anyone is interested I could do more battle testing on this and create a utility library for bridging jet-go and scany.

DevX86 commented 2 months ago

Here's the solution for scany implementation + works with encapsulated structs, if needed, modify your generator to include this.

// Tables.
UseTable(func(table metadata.Table) template.TableModel {
    return template.DefaultTableModel(table).
        UseField(func(columnMetaData metadata.Column) template.TableModelField {
            defaultTableModelField := template.DefaultTableModelField(columnMetaData)

            dbTag := columnMetaData.Name
            return defaultTableModelField.UseTags(
                fmt.Sprintf(`db:"%s"`, dbTag),
            )
        })
}).
DevX86 commented 2 months ago

Due to the issue below, a bridge is less ideal as we would have to make duplicate structs with nullable fields and some sort of Resolve() function with jet as an embedded struct, that either returns the normal struct or nil. Currently doing that but this at scale won't be fun. I'm considering having go-jet generate all tables again with all fields as pointers as NullTableName but that's a lot of duplication.

Scany Left Join Issue

veqryn commented 2 months ago

As nice as Scany and Ksql are, it would be much better if Jet supported PGX natively. Scany and Ksql have their own expectations of what the struct is supposed to look like. Jet creates its own struct table models, and it would be best placed to be the library scanning rows into them. It sounds like internally using your own interface, then accepting both sql.Rows and pgx.Rows, etc, would make them both work.

DevX86 commented 2 months ago

@veqryn Yeah scany's concept of scanning just isn't built for complexity like go-jet is. I just found a solution for the hopefully meantime which isn't so bad once you migrate a few queries. Native support would be awesome for this project and it honestly may attract some new people who are looking for an alternative solution or need to handle more complex structs.

safaci2000 commented 1 month ago

So, I'm digging into the pgx vs db.sql API differences, there's a few stdlib packages that pgx exposes, I'm assuming that both:

stdlib.OpenDBFromPool(pgxpool.Pool) sqlx.NewDb(stdlib.OpenDBFromPool(instance.dbPool), "pgx") returns DB which includes extends sql.DB.

are not sufficient for your usecase? I think both of these return a sql.DB equivalent that can be used with Jet.

DevX86 commented 1 month ago

@safaci2000 Admittedly I'm not smart, are you saying the native version of pgx can be used with go-jet? They have a database/sql version and a native version which is much more performant.

safaci2000 commented 1 month ago

@safaci2000 Admittedly I'm not smart, are you saying the native version of pgx can be used with jetgo? They have a database/sql version and a native version which is much more performant.

I'm not sure. My point that it seems like there's a pattern you can use that may use the pgx driver. Since you seen very motivated I'm using that driver I was pointing out something to try out to see if you get the performance you need.

kblomster commented 1 month ago

If you just need arrays, then this hack below might work for now. Note that this hinges on using pgx under the hood, but doing it via the database/sql interface.

// Array represents a postgres array containing elements of type T.
// We need this Scan implementation below because pgx is kinda awkward when used via database/sql; see e.g.
// https://github.com/jackc/pgx/issues/1458
// https://github.com/jackc/pgx/issues/1662
// https://github.com/jackc/pgx/issues/1779
type Array[T any] []T

func (a *Array[T]) Scan(src any) error {
    // implementation from https://github.com/jackc/pgx/issues/1779#issuecomment-1787125955
    m := pgtype.NewMap()

    v := (*[]T)(a)

    t, ok := m.TypeForValue(v)
    if !ok {
        return fmt.Errorf("cannot convert to sql.Scanner: cannot find registered type for %T", a)
    }

    var bufSrc []byte
    if src != nil {
        switch src := src.(type) {
        case string:
            bufSrc = []byte(src)
        case []byte:
            bufSrc = src
        default:
            bufSrc = []byte(fmt.Sprint(bufSrc))
        }
    }

    return m.Scan(t.OID, pgtype.TextFormatCode, bufSrc, v)
}

In future versions of golang, this should no longer be necessary because this proposal was accepted (although the implementation hasn't been merged yet): https://github.com/golang/go/issues/67546

DevX86 commented 1 month ago

@go-jet Is there anything I can do to help with this feature even if it's just research?

safaci2000 commented 1 month ago

@go-jet Is there anything I can do to help with this feature even if it's just research?

Have you tried the pattern I gave you last week. You seem concerned about performance, try using jet with those patterns and see if the performance is comparable to your expectations for pure PGX

DevX86 commented 1 month ago

@go-jet Is there anything I can do to help with this feature even if it's just research?

Have you tried the pattern I gave you last week. You seem concerned about performance, try using jet with those patterns and see if the performance is comparable to your expectations for pure PGX

Not sure how to approach that, can you explain in more detail?

safaci2000 commented 1 month ago

Not sure how to approach that, can you explain in more detail?

Well, to quote an earlier message from you:

Massive speed improvements, even selecting 15,000 records I saw a 3x improvement due to the binary transfer and native bindings.

Re-run that test and see if you can get the same performance.

Just have something along these lines in the code.

    dbPool, err := pgxpool.New(ctx, dbURI)
    if err != nil {
        log.Fatal().Err(err).Msgf("Unable to create a pgx pool")
    }
    db := stdlib.OpenDBFromPool(dbPool)
       stmt.Query(conn, &captureModel) //use jet to invoke query

Obviously you should save the dbPool and not create a new one on every call.

DevX86 commented 1 month ago

Not sure how to approach that, can you explain in more detail?

Well, to quote an earlier message from you:

Massive speed improvements, even selecting 15,000 records I saw a 3x improvement due to the binary transfer and native bindings.

Re-run that test and see if you can get the same performance.

Just have something along these lines in the code.

  dbPool, err := pgxpool.New(ctx, dbURI)
  if err != nil {
      log.Fatal().Err(err).Msgf("Unable to create a pgx pool")
  }
  db := stdlib.OpenDBFromPool(dbPool)
       stmt.Query(conn, &captureModel) //use jet to invoke query

Obviously you should save the dbPool and not create a new one on every call.

I was excited for this solution but it's half the speed as native pgx. (response took 2x as long)

DevX86 commented 3 weeks ago

@go-jet Any findings on this?

go-jet commented 3 weeks ago

No, I haven’t had the opportunity to look into this yet.

go-jet commented 3 weeks ago

@DevX86 In the meantime you can check Prepared Statements Caching, it might give you some performance improvements.

DevX86 commented 2 weeks ago

I never saw this, this is pretty cool. I've migrated over 100 queries to go-jet + pgx + scany. Using this for anything unmigrated, thank you! @go-jet