dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.71k stars 3.17k forks source link

Should consider saving any type as text or JSON text. #30361

Closed SF-Simon closed 1 year ago

SF-Simon commented 1 year ago

I have been using Swift to write iOS applications and Sqlite database. With the GRDB.swift library, I can easily complete the conversion and access between databases and entities. In particular, the GRDB.swift library is very simple and practical in handling arbitrary field storage.

At present, the learning cost of EF is really high for me. I think it is very complex and difficult to understand.

    public class MyHome
    {
        public int[] numbers { get; set; } = Array.Empty<int>();
        public House? mainHouse { get; set; }
        public House[] OtherHouses { get; set; } = Array.Empty<House>();

        public class House
        {
            public string ID = string.Empty;
            public string? Name;
        }
    }

In GRDB.swift library, Like this code, it will eventually generate only one "MyHome" table in sqlite, while the three attributes of Numbers, MainHouse and OtherHouses correspond to a text field respectively. The Numbers store arrays such as [1,2,3,4] and Others are stored as JSON text.

These fields are fields that do not need to be queried by criteria. (You need to write conversion functions and SQL to become query conditions).

Relatively more than the one-to-one or one-to-many restrictions of EF. This way not only meets the convenience of storage, but also reduces the number of tables. At the same time, it is very simple to understand.

My idea is whether EF can refer to such practices to reduce the difficulty and understanding of code writing, and also consider adding attribute types to limit the storage methods corresponding to the attributes of the class.

Current way: https://stackoverflow.com/a/34110125

public class Foo
{
    public Guid Id { get; set; }

    // Not Mapped attribute will make EF
    // ignore this property completely
    [NotMapped]
    public Bar BarObject { 
      get;
      set;
    }

    public string Bar{
       get{
          return JsonConvert.Serialize(BarObject);
       }
       set{
          BarObject = JsonConvert.Deserialize<BarObject>(value);
       }
    }
}

Why can't we simplify it?

thx.

roji commented 1 year ago

In GRDB.swift library, Like this code, it will eventually generate only one "MyHome" table in sqlite, while the three attributes of Numbers, MainHouse and OtherHouses correspond to a text field respectively.

For mainHouse and OtherHouses, it really isn't clear to me that mapping these to JSON/text columns in a single table is what most users are looking for. SQLite is a relational database, and the default strategy for mapping "related" things in relational database is generally via other tables and foreign keys... Databases have also optimizations for querying in this way: say you want to get all Houses in the system based on some criterion; if Houses are represented as text fields inside the MyHome table, that query is likely going to be very inefficient unless you take special care to set up some very special indexes. On the other hand, everything is much easier if each House is a row in a Houses table.

To summarize, if GRDB.swift really maps this to JSON/text by default, I'd find that quite a questionable choice; and I very much hope there's at least a way to instruct it to map to separate tables. In any case, if that's what you want, then EF does allow easily mapping to a JSON column (in SQL Server since 7.0, in SQLite support is coming in 8.0, and for PostgreSQL/MySql since EF 3.1 or so). For the SQL Server support, see JSON Columns.

numbers is a different case, since it's an array of a primitive/scalar value, rather than a full, complex type. I do think we should support this out-of-the-box - this is tracked by #29427 (note that PostgreSQL already supports this thanks to built-in array support in the database, so no need to create a string/JSON representation).

SF-Simon commented 1 year ago

In GRDB.swift library, Like this code, it will eventually generate only one "MyHome" table in sqlite, while the three attributes of Numbers, MainHouse and OtherHouses correspond to a text field respectively.

For mainHouse and OtherHouses, it really isn't clear to me that mapping these to JSON/text columns in a single table is what most users are looking for. SQLite is a relational database, and the default strategy for mapping "related" things in relational database is generally via other tables and foreign keys... Databases have also optimizations for querying in this way: say you want to get all Houses in the system based on some criterion; if Houses are represented as text fields inside the MyHome table, that query is likely going to be very inefficient unless you take special care to set up some very special indexes. On the other hand, everything is much easier if each House is a row in a Houses table.

To summarize, if GRDB.swift really maps this to JSON/text by default, I'd find that quite a questionable choice; and I very much hope there's at least a way to instruct it to map to separate tables. In any case, if that's what you want, then EF does allow easily mapping to a JSON column (in SQL Server since 7.0, in SQLite support is coming in 8.0, and for PostgreSQL/MySql since EF 3.1 or so). For the SQL Server support, see JSON Columns.

numbers is a different case, since it's an array of a primitive/scalar value, rather than a full, complex type. I do think we should support this out-of-the-box - this is tracked by #29427 (note that PostgreSQL already supports this thanks to built-in array support in the database, so no need to create a string/JSON representation).

Yes, if GRDB.swift converts it into JSON text storage by default according to the design thinking of EF, it is a very bad practice.

However, GRDB.swift is in line with the needs of a large number of simple database structure users, especially mobile client development projects. At least when designing databases and entity classes, it is simple and direct.

I have seen the writing method of "JSON Columns" in EF7.0, but I think it is not clear and simple enough (maybe I don't fully understand its writing method). For this reason, I wrote an attribute extension myself to complete this process.

My idea is to provide an out-of-the-box way, such as adding an attribute to a class or attribute, to make it easier to use. Because the demand depends on the needs of the actual project, I think it is advisable to provide more ways or more options. I think this is very necessary for cross-platform client development.

EF is one of my favorite ORMs. I think it is very powerful and advanced. But please forgive me for being frank. I think it would be more powerful if we could be more inclusive and refer to other solutions with actual needs, so as to cope with various actual development scenarios.

Anyway, this is just a little idea for me.

Thank you for your reply.

groue commented 1 year ago

For mainHouse and OtherHouses, it really isn't clear to me that mapping these to JSON/text columns in a single table is what most users are looking for. SQLite is a relational database, and the default strategy for mapping "related" things in relational database is generally via other tables and foreign keys... Databases have also optimizations for querying in this way: say you want to get all Houses in the system based on some criterion; if Houses are represented as text fields inside the MyHome table, that query is likely going to be very inefficient unless you take special care to set up some very special indexes. On the other hand, everything is much easier if each House is a row in a Houses table.

To summarize, if GRDB.swift really maps this to JSON/text by default, I'd find that quite a questionable choice; and I very much hope there's at least a way to instruct it to map to separate tables. In any case, if that's what you want, then EF does allow easily mapping to a JSON column (in SQL Server since 7.0, in SQLite support is coming in 8.0, and for PostgreSQL/MySql since EF 3.1 or so). For the SQL Server support, see JSON Columns.

Hello @roji, GRDB author here.

I'm in full agreement regarding the importance of a good relational schema, and GRDB documentation does not miss any opportunity to gently remind the users about this. "Design a proper schema first, and don't be shy on the time spent on this task!"

When a user defines a type intended to match database records (a "model", or a "record type") in the host programming language (C# here, Swift there), they choose the types of the properties of this model.

Properties with a scalar type, such as integers and strings, obviously fit database columns.

But what to do with properties with other types, such as arrays or other objects? The user has expressed the intent, with those properties, that some complex value should be stored in one column. In other words, the schema designed by the user contains unstructured data.

In order to support his use case, the choice of GRDB has been to map those types to JSON columns - objects or arrays. The reason for this choice is that JSON is gaining much traction in the world of databases. Many popular engines have great support for JSON, such as SQLite or PostgreSQL. You can even create indexes on values nested inside a JSON object, now!

By defaulting the serialization of "complex" types into JSON, I don't think GRDB has made the wrong choice. The most precious part of our story, I mean the database storage itself, remains in a good shape. Database engines have acknowledged that some users need to store unstructured data - and actually allow to process such data. Their commitment for JSON is the sign that things can only go better for JSON columns in the years to come, not worse.

roji commented 1 year ago

@groue

But what to do with properties with other types, such as arrays or other objects? The user has expressed the intent, with those properties, that some complex value should be stored in one column. In other words, the schema designed by the user contains unstructured data.

I'm not sure I agree with this.

First, there's nothing unstructured about this data - quite to the contrary. The .NET model above has a strongly-typed House class nested inside MyHome; House isn't any less structured than MyHome (which obviously is mapped to a table) - the .NET type just happens to be contained within another .NET type. True unstructured data would be represented by something that isn't strongly-typed in .NET, to allow for different schemas in the same column (e.g. just strings, or maybe JSON documents that can be traversed).

Second, I don't think there's any user intent here being expressed as to how House should be mapped to the database; all that's expressed is that a single MyHome has multiple Houses. I think we all agree that there are two valid, legitimate mapping strategies here: to a separate table with a foreign key (classical relational modeling), or to a column with JSON serialization.

The question really is what the default should be; EF Core defaults to relational mapping (when using a relational provider!), but a user can explicitly opt into a JSON column mapping instead. Doing it the other way around - as you're proposing - would mean treating a relation database as a NoSQL-first database, where things are JSON subdocuments and no table relationships exist by default, unless opted into it. I don't think that corresponds to how most users would like to map their data; after all, pure NoSQL databases do exist if you don't want relational - there's a reason someone chooses a relational database rather than a pure NoSQL one.

To summarize, the fact that relational databases do support JSON capabilities doesn't necessarily mean that everyone now wants to map everything as JSON.

groue commented 1 year ago

Ho yes, you're surely right as well.

I'm not discussing your design choices, of course. I'm just trying to help understand where the OP comes from. It happens that GRDB helps users express their intent very clearly, at the type level.

Types marked as "persistable" do write in a single database table - that's what they do, that's what "persistable" means in the library. Hence all properties of those persistable types target a single column. Complex properties included. JSON was chosen as the default encoding of those properties - users can of course override.

The types that represent graphs are not of the same kind, because they are fed from multiple tables, usually with the help of foreign keys. GRDB does not allow (yet?) to persist them, only decoding is ready-made. Complex properties of those types are decoded from other database tables, or from JSON columns, depending on the structure of the values loaded from the database (this works because property names match columns, and some columns are relationships - foreign keys):

// Reads and writes into the table for players
struct Player: FetchableRecord, PersistableRecord {
    var id: String
    var name: String
    var awards: [Award] // stored and decoded from JSON
}

// A (read-only) graph object that links a player and her games
struct Career: FetchableRecord {
    var player: Player // decoded from player table
    var games: [Game]  // decoded from game table
}

Eventually, there isn't any ambiguity, and the user intent is clear despite the lack of any explicit annotation or extra code. The library doesn't have to hesitate between relations and JSON mapping - and user can override when needed.

SF-Simon commented 1 year ago

We all use EF, so we all like this design mode and thinking. Therefore, it is not to deny it. GRDB's application scenarios are all clients. It is very clear about the user's positioning, so there are some simplified and even default methods that I think are understandable.

The main thing is that EF as a database management framework, EF needs to be inclusive and open in addition to ensuring its robustness. After all, whether it can satisfy users is the most important thing. So, if there is a simplified method, I will be very happy, and I will make appropriate choices according to the actual development needs.

I think @groue has the same attitude. I can see that he also likes such a fast way. Provide one more choice and give the user the right to choose.

groue commented 1 year ago

It took years for GRDB until this area of the api was stabilized. I think it is a success: convenient, versatile, without sacrificing schema robustness 🤓 It is very good when maintainers of database libraries or framework are very strict, and make sure the api does not foster sloppy db design. It takes time.

Maybe this little introduction to an external library was useful, who knows? Thanks for your time, both @SF-Simon and @roji.

SF-Simon commented 1 year ago

It took years for GRDB until this area of the api was stabilized. I think it is a success: convenient, versatile, without sacrificing schema robustness 🤓 It is very good when maintainers of database libraries or framework are very strict, and make sure the api does not foster sloppy db design. It takes time.

Maybe this little introduction to an external library was useful, who knows? Thanks for your time, both @SF-Simon and @roji.

I like GRDB very much. It really helps me a lot. I don't need to think about storage at all, even arrays, structures or more complex structures. thank you very much. I think you can try to persuade the designers of EF to make it easier. ha-ha!!

groue commented 1 year ago

I don't need to think about storage at all

Don't say that, you ruin all your efforts[^1]! 😱

What you mean is that once you have properly designed your schema, including JSON columns where appropriate, you are happy that they're supported out of the box 😉

[^1]: All maintainers of database libraries want the users to be as schema-obsessed as themselves - I surely am guilty of that 🙂

SF-Simon commented 1 year ago

Please allow me to abandon myself. 😉 I hope to illustrate the importance of GRDB and my dependence on it.

In general, I am currently completing similar work in EF through some extension code, but I still hope that it is "officially" recognized and supported. Because I will always use EF in the future, I certainly hope it will be more powerful and meet my expectations.

It seems to be beyond the scope of discussion. We should return to the discussion of programming.

roji commented 1 year ago

@SF-Simon what, concretely, do you think should be improved in EF, given that we already allow mapping any type to JSON (at least for SQL Server, docs)?

SF-Simon commented 1 year ago

I read the documentation, and actually made a demo. I tried to use the writing methods that are not in the documentation, such as arrays, List, structures, etc.

I find the whole process difficult to understand, unclear and complicated to use. Mainly, there are too many things I need to understand, and it is actually converting an object to text and storing it.

To put it simply, I think it is possible to simplify its coding, or to make the learning cost of this document not so high?

For example, can you add an Entity Properties, like [Column], [MaxLength] and so on. (As [JSONTextField(isDirect=true)])When an attribute of any type has specific characteristics, it means that it is automatically serialized into text (even Bytes), and the type of the column is also text.

All in all, I don't want to understand the design of EF too deeply, and I don't want to write a lot of code. I just want to simply complete the single-table storage requirements of an entity. Whether it is JSON, or Bytes, or whether it can be used in Linq, I think sometimes, I don't care about these.

roji commented 1 year ago

Serializing primitive arrays (ints, strings) is tracked by #29427, and a [Json] attribute is tracked by #28933.

Do you see any other concrete things we could to improve this?

SF-Simon commented 1 year ago

Serializing primitive arrays (ints, strings) is tracked by #29427, and a [Json] attribute is tracked by #28933.

Do you see any other concrete things we could to improve this?

Hello, @roji, I'm sorry, because my previous attempts were completed under EF7, there may be some plans that EF8 already has or things that I haven't understood.

I plan to try the beta version of EF8 again this weekend. After I understand and test it, I will write down my doubts and expectations in the process.

Wait for me a few days, thank you.

ajcvickers commented 1 year ago

EF Team Triage: Closing this issue as the requested additional details have not been provided and we have been unable to reproduce it.

BTW this is a canned response and may have info or details that do not directly apply to this particular issue. While we'd like to spend the time to uniquely address every incoming issue, we get a lot traffic on the EF projects and that is not practical. To ensure we maximize the time we have to work on fixing bugs, implementing new features, etc. we use canned responses for common triage decisions.