JeremyLikness / jeremylikness-blog

Developer for Life Blog
https://blog.jeremylikness.com
16 stars 7 forks source link

I want to write C# on my dbms #1

Closed sam-wheat closed 4 years ago

sam-wheat commented 4 years ago

Hi Jeremy, please see my post here.

JeremyLikness commented 4 years ago

Read it, thanks!

leaderanalytics commented 4 years ago

Thanks Jeremy, I would like to know your thoughts on what I proposed. I know it is a bit of a reach - but so was Blazor when it was proposed for the very first time. Can what I asked for be done?

JeremyLikness commented 4 years ago

I'm not sure what problem that is trying to solve. "Performance is better" is reasonable if you can't get adequate performance, but I'm not sure that's the case. If you want a relational model you need to use a relational language or you're going to get hit in translation, so the LINQ still has to get translated. If you want a document model, those already exist (MongoDB, RavenDB, Cosmos DB, etc.). If you open the EF Core code and pick through the Query parts I think you'll see why the notion of LINQ is a good one but in practice translating between types and relationships is a complex and nontrivial exercise, regardless of where the code lives.

leaderanalytics commented 4 years ago

Thanks for the second look at this. I was a little long winded in my post.

I'm not sure what problem that is trying to solve.

In fewest words: I want to omit converting my LINQ statements to SQL. I want a DBMS that understands and executes LINQ natively without first generating SQL: Customer c = db.customers.where(x => x.ID == 10).First(); // DBMS parses and executes this. No SQL statement generated

I also want to be able to do this:

Customer c = db.customers.where(x => MyMethod(x)).First();
private bool MyMethod(Customer c)
{
    return c.ID == 10;
}

where MyMethod exists in my service layer.

If you want a relational model you need to use a relational language or you're going to get hit in translation,

My question has nothing to do with relational vs document. I am asking to you to take MSSQL Server and make it understand LINQ without first generating a sql statement. Keep the tables, indexes etc exactly as they are.

Rather than write stored procs I want to upload my service layer i.e. MyMethod shown above as a nuget package to the DBMS. Can this be done?

JeremyLikness commented 4 years ago

If you built a new database from the ground up it is likely possible, but that's a lot of effort just to make LINQ the query language.

sam-wheat commented 4 years ago

Jeremy what I am asking for is much much more than just changing the query language.

Please remove from your mind all of the reasoning behind the way we write service layer code today. Instead of thinking about new ways to work around the limitations we currently face, ask yourself "What if the limitation itself could be removed?" Start with a blank sheet of paper and imagine what DBMS you would create as a dotnet developer that would make your life simple.

Please look again at this example:

Customer c = db.customers.where(x => MyMethod(x)).First();
private bool MyMethod(Customer c)
{
    return c.ID == 10;
}

The query above cannot be translated to SQL by Entity Framework because there is no way to express MyMethod in SQL. Imagine what it would be like if this limitation were removed and you could write your entire service layer in statically typed c# without worrying if the code could be executed on the server. The ability to do this is invaluable. It's game changing. It is much more than just a choice of query language. It is what EF is trying to achieve but in a roundabout and totally inefficient way.

EF is hobbled by the requirement that it first translate all queries to SQL before executing them. This has proven to be a formidable challenge. EF is over a decade old and still only of limited use. Please let me repeat that: Over ten years of tough development and we still only have a a brittle and fragile product that barely bridges the gap. The problem is not the EF team - those guys are really smart and really dedicated. The problem is they are trying to build a product that cannot bear the burden of its own weight.

Think about the bulk of the work that EF does: it translates your linq query to SQL. This requirement is not only an enormous technical challenge but it is also completely unnecessary. As an EF developer do you ever even look at the SQL that is produced by the majority of the queries you write? No, you probably don't. You only only look at the sql of about 20% of your queries. The rest of the time EF is spinning its wheels generating code that is never even looked at by the developer.

ANSI SQL as a platform is so embedded in developers minds that they cannot imagine a world without it. ANSI SQL presents a ton of problems (the term "Impedance mismatch" has survived all these years because its real) yet all we can think about is how to work around each problem. I believe the time has come to stop thinking of new workarounds. Let's get rid of SQL.

What if the ten years spent developing a workaround (EF) were instead invested in a technology that allowed us as dotnet developers to interact seamlessly with the DBMS right from managed code? What if we started from absolute zero and built a DBMS optimized to work with dotnet? What would that technology look like? What would it do? How would it be different? What inefficiencies would be overcome? Please tell me what you think.

JeremyLikness commented 4 years ago

But that's the point - LINQ itself doesn't do anything but build expression trees. It has to be translated to something! For example, if we have a query we can translate to SELECT ... WHERE then we know the database will return that record and use any indexes. What about a "LINQ native" database? Here's a challenge for you. Take this gist and run it:

https://gist.github.com/JeremyLikness/7cbf85231b5d75a5c3fa16d09ec1c4a4

That is pure LINQ. No translation to the database. Consider how inefficient it is. Even native, C# LINQ doesn't know how to index on the list of things and return it with a single operation. It has to scour objects one by one.

This is just in memory, no database. So for this to work with a database implementation, you would need it to work in pure LINQ as well. Are you able to refactor that gist so that LINQ "understands" the method being called well enough to return the right entity without parsing the list? Feel free to use a dictionary or hashset or any in-memory emulation of an index you desire. The requirement is that the code snippet you shared can run "as is" and return a single entity without parsing the expression tree or translating.

If someone can solve that, we might have a foundation to look into something deeper.

sam-wheat commented 4 years ago

Your point is well made - I understand that LINQ by itself does not make a DBMS. I understand that LINQ has to be translated. What I am suggesting here is that LINQ be translated to something other than ANSI SQL.

I have no idea exactly what that intermediate may be. I suppose it will borrow concepts like pages and extents and indexes and tables. I don't expect the concepts of joins and foreign keys will ever go away. The goal here is to remove the constraint from the server of parsing an ANSI standard SQL statement. This will remove many technical hurdles and open the door for much faster execution of a more object oriented query language like LINQ. I don't think the intermediate has to be a very friendly "language" either - as long a readable query plan can be produced it should be fine.

What if we were to take an approach like Blazor where there is a server side and a client side model.

Client side model might work much as it does today with EF. However instead of EF needing to contort itself every which way to create SQL, it will pass the expression tree or perhaps a string representation to the server. Since the server will understand LINQ it will not have to create SQL. Instead it can directly produce its own internal query that maps to data on disk.

Server side model might work like this:

Developer writes a class like the one below in a library project. The library is built and as part of the build process three things things happen:

1.) nuget package is created 2.) nuget is (optionally) deployed to dbms 3.) lightweight client component is generated

Class below gets packaged up in a nuget and sent to the server. Classes like this are effectively stored procedures written in c#. This class might be instantiated much like a http request in that it is newed up when a method is called than immediately disposed when done.

public class CustomerService : ICustomerService
{
    private Database db;

    public CustomerService(Database db)
    {
        this.db = db;  // db is the server process itself or some thin wrapper
    }

    public virtual bool SaveCustomer(Customer c)
    {
        //...
    }
    public virtual Customer GetCustomer(int ID)
    {
        return db.Customers.Where(x => MyMethod(x)).First();
    }

    private bool MyMethod(Customer c)
    {
        //...
    }
}

This class can be generated by the build process or created by hand by the developer. It just contains stubs that make calls to the server. This class can be injected into a ViewModel or controller:

public class ClientSide_CustomerService : ClientSideDatabase, ICustomerService
{
    public virtual bool SaveCustomer(Customer c)
    {
        return this.Database.SaveCustomer(c); // call the server
    }
    public virtual Customer GetCustomer(int ID)
    {
        return this.Database.GetCustomer(ID); // call the server
    }
}

Your thoughts?

JeremyLikness commented 4 years ago

I don't understand how this is different than existing solutions like OData and CLR stored procedures.

sam-wheat commented 4 years ago

If those are viable solutions than what is the need for technologies like EF? Why is it still under development?

JeremyLikness commented 4 years ago

OData relies on EF Core to talk to the database. It is viable because of EF Core, not in spite of it.

I have no idea why the CLR feature hasn't taken off, that would require some digging. I read that as a pretty strong signal that either there is not a strong demand for .NET on the server or there are issues with the implementation.

EF Core is under development because people like a consistent API to translate C# and LINQ to underlying database providers. It's not a SQL-centric solution, which is why it is used for PostsgreSQL, CockroachDB, MySQL, Oracle, and Cosmos DB among other things. It represents a desire by the community to have a solution that embraces an open ecosystem of databases over a proprietary, .NET-centric solution and/or SQL Server centric solution. It also exists because database concepts like partition keys, clustered indexes, views and stored procedures aren't part of native language specs (outside of SQL dialects) so it is necessary to use the native dialect or provide an adapter between dialects.

I get the idea conceptually and have shared why I believe it hasn't (and won't) be implemented concretely.