dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.73k stars 3.18k forks source link

Cosmos EF provider modifying JSON values with the "id" in identifier #28600

Closed MysteryAchievement closed 2 years ago

MysteryAchievement commented 2 years ago

We are using Azure Cosmos DB EF provider to access data in a Cosmos Db collection. The data is stored in a JSON format. Here is a sample record:

{
    "devid": "e2bff6d3b3ad88d2fcd6984c33152e36ff8a528c",
    "start": "2022-08-03T20:59:36.715Z",
    "current": "2022-08-03T21:16:17.594Z",
    "probes": [{"id": 1, "sp": 50, "temp": 0},{"id": 2, "sp": -1, "temp": -1},{"id": 3,"sp": -1,"temp": -1},{"id": 4,"sp": 66,"temp": -1}],
    "id": "4b0c1d5f-f9ab-4aed-9a21-986c1c4ed022",
    "_rid": "zPgxAPzFW4sZHQAAAAAAAA==",
    "_self": "dbs/zPgxAA==/colls/zPgxAPzFW4s=/docs/zPgxAPzFW4sZHQAAAAAAAA==/",
    "_etag": "\"00004b09-0000-0100-0000-62eae5a10000\"",
    "_attachments": "attachments/",
    "_ts": 1659561377
}

The weirdness occurs when we fetch the data from Cosmos. Here is the C# code":

            var sessions = await CosmosQueryableExtensions.FromSqlRaw(_dbContext.GrillSessions,
                                                              $"SELECT * FROM GrillSessionsCollection c WHERE c.devid = '{request.DeviceId}' AND 
                                                              c.begin = '{request.StartTime}'")
                                                              .OrderByDescending(s => s.begin)
                                                              .ToListAsync(cancellationToken);

where the models defined as follows:

public class Data
    {
        public string deviceid { get; set; }
        public DateTime start { get; set; }
        public DateTime current { get; set; }
        public List<Probe> Probes { get; set; }
    }

        public class Probe
    {
        public int id { get; set; }
        public int sp { get; set; }
        public int temp { get; set; }
    }

If the probes array is fully populated (all 4 are present in the array (as displayed in the JSON above), everything is cool. They come back in the order placed in the array order by id 1 thru 4.

When the array is sparsely populated (that is, the array has some combination of 1, 2 or 3 probes), then the array returned with the id values renumbered. For example, this JSON snippet shows probes 1 and 4 present:

"probes": [
        {
            "id": 1,
            "sp": 50,
            "temp": 0
        },
        {
            "id": 4,
            "sp": 66,
            "temp": -1
        }

but when the data comes back, it looks like this (represented as an List for illustration purposes as seen in the debugger) :

probes = {List<Probe>} Count = 2
   [0] = Probe
       id = {int} 1
       sp = {int} 50
       temp = {int} 0
   [1] = Probe
       id = {int} 2
       sp = {int} 66
       temp = {int} -1

Other examples include: if probes 1, 3 and 4 are present, the id value in the list will be renumbered to 1 (same), 2 (should be 3) and 3 (should be 4) if probes 2, 3 and 4 are present, the id value in the list will be renumbered to 1 (should be 2), 2 (should be 3) and 3 (should be 4)

If you renamed the JSON 'id' field to 'probeId', the renumbering still occurs. So I tried something ridiculous and renamed 'id' to 'eyedee' and the renumber did not occur.

If the 'id' field is converted to a string instead of an integer, the renumbering does not occur.

The question is: why does the renumbering occur? Is there a way to prevent it? Please let me know if I need to provide more info.

EF Core version: 6.0.5.0 Database provider: Azure.Cosmos.Core (3.21.0.0) Target framework: .NET 6.0 Operating system: MacOS 12.4 IDE: JetBrains Rider 2022.1.1

ajcvickers commented 2 years ago

@AndriySvyryd Looks like key values are being synthesized here when they should not be.

AndriySvyryd commented 2 years ago

For owned collections EF uses the position of the item as the key value and updates it to the current position. To prevent EF from modifying the id values configure them with ValueGeneratedNever():

modelBuilder.Entity<Data>(
    eb => eb.OwnsMany(
        v => v.Probes, b =>
        {
            b.Property(p => p.id).ValueGeneratedNever();
        }));
julielerman commented 1 year ago

Great find, @MysteryAchievement . Unfortunately, I've run into the same scenario and the suggested fix (valuegeneratednever) is not working. @AndriySvyryd I could privately share the sample with you . I have done MUCH science on this to try to see if I can move the needle. But the mapping isn't doing the trick and I have to include the full collection. Note that the collection is embedded. And in order to achieve the mapping (thanks to the helpful exception methods) I am first ignoring the mapping for that entity and then applying the explicit ownsmany mapping. Also notable: in my scenario, the production code should always return the entire collection anyway because its part of an aggregate) but its driving me mad that I can't get the mapping to do provide the fix. (Using 6.0.13 and 7.0.4). let me know if you have time and interest and I will invite you to the repo which is currently private. Thanks!

AndriySvyryd commented 1 year ago

@julielerman If you have a small repro then go ahead and invite me to the repo to investigate.

julielerman commented 1 year ago

Always the best advice! Creating a repo from scratch highlighted my mistake!

MysteryAchievement commented 1 year ago

Hi @julielerman - what was your mistake? Does that mean you solved this? I never got ValueGeneratedNever() to work.

julielerman commented 1 year ago

Holy smokes, it never worked for you anyway?

I spent so much time not being able to make the exception go away ("An item with the same key has already been added. Key: o") on a query. I could not get to WHICH key it was in my aggregate that has one entity mapped as ownsmany. but my IDs are GUIDs but changing the name of my keys DID make the problem go away just like you experienced. But I did not want strangely named keys so I reverted them and kept banging on the problem.

In the end, I did so many experiments in my tests, I literally can't get back to whatever was throwing the error. That's also making me bananas. LOL. I thought maybe even (a 4am idea) the fact that I had forgotten to update the provider from sqlserver to cosmos in the integration test project. I went back and tried to break it again but gave up and after a day of not being able to get rid of the problem without a big change to how I was querying the data.

Frankly, I've had so many weird things happening (v6.0.13 and 7.0.4) when doing queries that are not straightforward where code will throw one day and the next day it works. Example: I could not use projection queries because AsNoTracking was being ignored, but today it's working. I started my test project with SQL server, then switched to Cosmos. So I had some things like filtered includes which don't work in cosmos so I just don't know and give up and have to move on.

So ..resolution? No. But this was a great therapy session and I'm feeling much better, thank you. :satisfied:

MysteryAchievement commented 1 year ago

No, I never did get it to work. We are basically ignoring the problem for now, but are rapidly approaching the point where we need to come up with a solution for it before our code goes into production. It seems strange to me that more people aren’t experiencing this problem.