JordanMarr / SqlHydra

SqlHydra is a suite of NuGet packages for working with databases in F# including code generation tools and query expressions.
MIT License
212 stars 20 forks source link

SQLHydra's maxBy causes SqlHydra code to fail if querying an empty table #53

Closed mjdupontMEA closed 10 months ago

mjdupontMEA commented 1 year ago

Noticed this when trying to use a maxBy on an empty table. We were able to work around with a sortByDescending into take 1, (we also could grab a count and branch on whether that returned anything) but wanted to suggest it might be a bit more intuitive to have maxBy handle the null without breaking; perhaps by returning an option?

This problem was encountered using SqlHydra.SqlServer.

I believe this should be reproducible by creating a database with an empty table, creating the schema for said table, and querying for a maxBy on one of the fields. Our particular field was a dateTime2(6) if that is relevant.

JordanMarr commented 1 year ago

Can you give me the generated query along with the preferred query?

mjdupontMEA commented 1 year ago

Sorry for the delay in getting back to this. I think I understand what you're looking for.

For a bit of context, I'm writing this on behalf of my team at work.

We encountered this issue when we were attempting to use maxBy to retrieve a field of a record; specifically, a non-nullable datetime2(6). In our particular use case, the table we were querying had no records. If I recall correctly, we suspect that maxBy is returning a Result containing a null; SqlHydra is then trying to convert that null into a non-nullable value to match the column type that we were retrieving, and then failing. Our suspicion would be SqlHydra.Read() as to where this error is occurring.

I'm grabbing snippets of the code we're using; an example of what worked, an example of what didn't, a few helper functions, and the autogenerated code for the dbo.

Our code was something similar to this:

module HelperFunctions =
  let openContext connStr () = 
    let compiler = SqlKata.Compilers.SqlServerCompiler()
    let conn = new SqlConnection(connStr)     new QueryContext(conn, compiler)

  /// Shortcut for configuring SqlHydra to use a new SQL Server connection with the given conn string
  /// Instead of e.g., `deleteAsync (ContextType.Create (openContext connStr))`, you would put
  /// `deleteAsync (createContext connStr)`
  let createContext connStr = ContextType.Create (openContext connStr)

  let connString = "..."
open HelperFunctions

///////////////////////////////////

module dbo = 
      type weather_history =
        { weather_station: string
          [<SqlHydra.ProviderDbType("DateTime2")>]
          time: System.DateTime
          weather_instrument: byte
          weather_value: double
          [<SqlHydra.ProviderDbType("DateTime2")>]
          entry_timestamp: System.DateTime }

    type weather_historyReader(reader: Microsoft.Data.SqlClient.SqlDataReader, getOrdinal) =
        member __.weather_station = RequiredColumn(reader, getOrdinal, reader.GetString, "weather_station")
        member __.time = RequiredColumn(reader, getOrdinal, reader.GetDateTime, "time")
        member __.weather_instrument = RequiredColumn(reader, getOrdinal, reader.GetByte, "weather_instrument")
        member __.weather_value = RequiredColumn(reader, getOrdinal, reader.GetDouble, "weather_value")
        member __.entry_timestamp = RequiredColumn(reader, getOrdinal, reader.GetDateTime, "entry_timestamp")

        member __.Read() =
            { weather_station = __.weather_station.Read()
              time = __.time.Read()
              weather_instrument = __.weather_instrument.Read()
              weather_value = __.weather_value.Read()
              entry_timestamp = __.entry_timestamp.Read() }

        member __.ReadIfNotNull() =
            if __.weather_station.IsNull() then None else Some(__.Read())

///////////////////////////////

let thisFails = 
  selectAsync HydraReader.Read (createContext connString) {
    for record in table<dbo.weather_history > do
    where (record.weather_instrument = 0uy)
    select (maxBy record.entry_timestamp) }

let thisIsWhatWeUsed = 
  selectAsync HydraReader.Read (createContext connString) {
    for record in table<dbo.weather_history > do
    where (record.weather_instrument = 0uy)
    orderByDescending record.entry_timestamp 
    take 1
    select record.entry_timestamp }
    |> Async.map Array.tryHead

thisFails throws an exception, rather than returning an empty array.

We'll work on getting a more thorough/reproducible example on this issue.

JordanMarr commented 1 year ago

So it looks like SqlKata (which creates the actual SQL queries), avoids adding extra query manipulators in general. See this thread: https://github.com/sqlkata/querybuilder/issues/439

To elaborate, I think the SqlKata approach is to be more of a strongly typed SQL DSL than a "magic" ORM that injects random fixes behind the scene on behalf of the user. I think SqlKata has the right idea here because that would likely be an endless rabbit hole, and the user would never quite know what SQL was being generated behind the scenes.

If a fix were implemented on behalf of the user, it would be complicated because every SQL provider would likely have a different way of handling it. Since SqlKata doesn't want to go here, I don't think I want to either.

The only other possible point to handle this would be at the generated HydraReader code that handles reading individually selected tuple results. That code doesn't really know much about the intentions of the query, so I'm not sure how I could know to handle for null. It's all very generic at that point.

If you were writing the SQL manually, you would still get a null, and the fix would be to write the appropriate SQL. I think what you have done is exactly that, which seems appropriate.

ntwilson commented 10 months ago

Hey @mjdupontMEA (and anyone else stumbling across this), I discovered a workaround without needing modifications. From your earlier example:

let thisFails = 
  selectAsync HydraReader.Read (createContext connString) {
    for record in table<dbo.weather_history > do
    where (record.weather_instrument = 0uy)
    select (maxBy record.entry_timestamp) }

if you change it to

let thisWorks = async {
  let! result = selectAsync HydraReader.Read (createContext connString) {
    for record in table<dbo.weather_history > do
    where (record.weather_instrument = 0uy)
    select (maxBy (Some record.entry_timestamp)) 
    tryHead
  }
  return Option.flatten result
}

it'll work properly because by selecting the max of Some record.entry_timestamp, it makes SqlHydra treat the type as an Option, and it successfully parses DBNull instead of crashing. I wonder if this is worth including in the docs by aggregates? It seems like a fairly common use case where you select an aggregate that might return NULL...

JordanMarr commented 10 months ago
let thisWorks = async {
  let! result = selectAsync HydraReader.Read (createContext connString) {
    for record in table<dbo.weather_history > do
    where (record.weather_instrument = 0uy)
    select (maxBy (Some record.entry_timestamp)) 
    tryHead
  }
  return Option.flatten result
}

it'll work properly because by selecting the max of Some record.entry_timestamp, it makes SqlHydra treat the type as an Option, and it successfully parses DBNull instead of crashing. I wonder if this is worth including in the docs by aggregates? It seems like a fairly common use case where you select an aggregate that might return NULL...

Very creative fix! I think that would definitely be helpful to have in the readme! Also, table names are now generated. See notes for this release.