fsprojects / SQLProvider

A general F# SQL database erasing type provider, supporting LINQ queries, schema exploration, individuals, CRUD operations and much more besides.
https://fsprojects.github.io/SQLProvider
Other
579 stars 146 forks source link

Unrestricted Updates/Inserts/Deletes #417

Open nicholas-peterson opened 7 years ago

nicholas-peterson commented 7 years ago

I noticed recently some issues/pr's in a similar vein, namely one relating to unrestricted deletes, and another about comparing two columns both on the query side (rather than the left being an F# value).

I'd like to be able to do something equivalent to the following:

UPDATE A
    SET A.COLUMN1 = 'FSHARP VALUE'
WHERE A.COLUMN2 = 'ANOTHER FSHARP VALUE'

or, more involving:

UPDATE A
    SET A.COLUMN1 = B.COLUMN2
FROM TABLEA AS A
INNER JOIN TABLEB AS B
    ON A.KEY = B.KEY
WHERE A.COLUMN3 = 'FSHARP VALUE'

edit for some other examples:

INSERT INTO TABLEA
SELECT COLUMN1,COLUMN2,'FSHARP VALUE1'
FROM TABLEB
WHERE TABLEB.COLUMN1 = 'FSHARP VALUE2'
DELETE TABLEA
FROM TABLEA
INNER JOIN TABLEB
    ON TABLEA.KEY =TABLEB.KEY
WHERE TABLEB.COLUMN1 = 'FSHARP VALUE1'

Basically, I want to push as much of the operational semantics to the SQL layer as possible, influencing the query only with some values from F#, mostly to constrain joins/where/group/having clauses. The goal is to stop having to produce UDF/PROC's in native T-SQL in order to perform set based updates/inserts.

Is something like this already possible? Is it antithetical to the intent of this type provider?

Thorium commented 7 years ago

I kind of like the idea. The problem is more the design of .NET LINQ, which is mostly meant for select-queries only.

While fetching data is quite standard SQL, the rules of insert/update/delete does vary more. To keep things composable we should use expressions and not direct functions, but how to express those queries with LINQ?

I did already face some issues when I did PR #404. Maybe there should be a separate class for these kind of items, and not to put them under Seq-module.

nicholas-peterson commented 7 years ago

Let me begin this response by stating I understand how incredibly non-trivial any of this would likely be to implement. Not to mention across multiple database back-ends. I also know about the FSharp.Data.SqlClient which attempts to support this more directly, but it only works for T-SQL.

Below are some suggestions of syntax in linq to support some generic insert/update/delete operations. I'm uncertain of what is possible to accomplish inside computation expressions so some of the later suggestions may be entirely unworkable. The goal is support 'just-enough' syntax to allow some common set-oriented scenarios to not require explicitly created stored procs/User defined functions. Obviously as more exotic operations come into play eventually one will have to drop into straight SQL, but the current behavior of the type provider pretty much precludes set operations. Which means anything outside basic one-record crud statements will become slow at scale.

query {
    for employee in context.dbo.Employees do
    join skill in context.dbo.Skills on (employee.EmpId = skill.EmpId)
    groupBy (employee.EmpId) into employeeSkills
    select count
    update (fun e -> 
        employee.TotalSkills <- e)
}

would transform into

UPDATE employee
    SET employee.TotalSkills = COUNT(*)
FROM dbo.Employees AS employee
INNER JOIN dbo.Skills AS skill
    ON employee.EmpId = skill.EmpId
GROUP BY employee.EmpId

The value returned from executing this statement would be the implicit rowcount. So in essence after the query you would issue an |> Seq.toList |> ignore if you don't care about the result.

Something like an insert would look like this:

query {
    for employee in context.dbo.Employees do
    where (employee.HireDate < DateTime.Now.AddYears(-1)
    select employee
    insert vestedEmployee in context.dbo.VestedEmployees (fun e ->
        vestedEmployee.EmpId <- e.EmpId
        vestedEmployee.HireDate <- e.HireDate
        vestedEmployee.RecordCreatedDate <- DateTime.Now
        )
}

would transform into

INSERT INTO dbo.VestedEmployees (EmpId, HireDate, RecordCreatedDate)
SELECT EmpId, HireDate, '2017-05-19'
FROM dbo.Employees AS employee
WHERE employee.HireDate < DATEADD(year,-1,GETDATE())

Theoretically, an insert could be based off a derived sequence:

query {
    for employee in [("1234",DateTime.Now.AddYears(-1), DateTime.Now); ("1235", DateTime.Now.AddYears(-2), DateTime.Now)] do
    select employee
    insert vestedEmployee in context.dbo.VestedEmployees (fun e -> 
        let empId, hireDate, currentDate = e
        vestedEmployee.EmpId <- empId
        vestedEmployee.HireDate <- hireDate
        vestedEmployee.RecordCreatedDate <- currentDate)
}

would transform into

INSERT INTO dbo.VestedEmployees (EmpId, HireDate, RecordCreatedDate) 
VALUES 
    ('1234', DATEADD(year,-1,GETDATE(), GETDATE()),
    ('1235',  DATEADD(year,-2,GETDATE(), GETDATE())

DELETE would follow a similar pattern/strategy.

Sometimes queries need other DML statements to happen server-side beforehand:

query {
    let mutable hireDateCutoff = DateTime.Now.AddYears(-1)
    for x in dbo.Employees do 
    where (x.HireDate < hireDateCutoff)
    select x
}

transforms into:

DECLARE @hireDateCutoff Datetime = DATEADD(year, -1, GETDATE());

SELECT *
FROM dbo.Employees AS x
WHERE x.HireDate < @hireDateCutoff

Sometimes you need to issue two queries in sequence to get two different values.

query {
    let mutable totalEmployees = 0
    let mutable totalSkills = 0

    for x in dbo.Employees do
    select 
    count
    (fun y -> 
        totalEmployees <- y)

    for x in dbo.Skills do
    select
    count 
    (fun y ->
        totalSkills <- y)

    for tally in [totalEmployees;totalSkills] do
    select tally

} Seq.toList 

would transform into:

DECLARE @totalEmployees INT = 0;
DECLARE @totalSkills INT = 0;

SET @totalEmployees = SELECT COUNT(*) FROM dbo.Employees;
SET @totalSkills = SELECT COUNT(*) FROM dbo.Skills;

SELECT @totalEmployees, @totalSkills;

Theoretically one could also extend these clauses (insert/update/delete) with a subsequent 'output' clause to assign the output of the query to a declared variable (like a table variable). The goal of this would be to enable subsequent use in another statement, or perhaps to return identity values from an operation.