Azure / azure-cosmos-dotnet-v2

Contains samples and utilities relating to the Azure Cosmos DB .NET SDK
MIT License
579 stars 837 forks source link

Unable to use camel case in JSON when C# types use pascal case #317

Open Liversage opened 7 years ago

Liversage commented 7 years ago

I want to use a custom JsonSerializerSettings to control the format of the JSON stored in CosmosDB. In particular I want to use CamelCasePropertyNamesContractResolver to ensure that the JSON is camel cased even though the C# types use pascal case.

With the latest releases of the Microsoft.Azure.DocumentDB NuGet package it is now possible to provide a JsonSerializerSettings when creating a DocumentClient. However, in some cases the camel case versus pascal case issue is still a problem.

I experience two distinct problems but I have taken the liberty to include both in this report.

I use two document classes:

public class TestDocument
{
    public string Id { get; set; }
    public int Number { get; set; }
    public EmbeddedDocument Embedded { get; set; }
}

public class EmbeddedDocument
{
    public string Text { get; set; }
}

Here is code to create an document in CosmosDB based on these classes:

var jsonSerializerSettings = new JsonSerializerSettings {
    ContractResolver = new CamelCasePropertyNamesContractResolver()
};
var client = new DocumentClient(
    new Uri("https://localhost:8081/"),
    "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==",
    jsonSerializerSettings);
var database = new Database { Id = "TestDatabase" };
await client.CreateDatabaseIfNotExistsAsync(database);
var collection = new DocumentCollection { Id = "TestCollection" };
var databaseUri = UriFactory.CreateDatabaseUri(database.Id);
await client.CreateDocumentCollectionIfNotExistsAsync(databaseUri, collection);
var testDocument = new TestDocument {
    Id = "Alpha",
    Number = 42,
    Embedded = new EmbeddedDocument { Text = "Beta" }
};
var collectionUri = UriFactory.CreateDocumentCollectionUri(database.Id, collection.Id);
await client.CreateDocumentAsync(collectionUri, testDocument);

The JSON in CosmosDB is properly camel cased as expected (I have removed the underscore prefixed properties for clarity):

{
    "id": "Alpha",
    "number": 42,
    "embedded": {
        "text": "Beta"
    }
}

My first problem is that querying using LINQ does not work:

var query = client.CreateDocumentQuery<TestDocument>(collectionUri)
    .Where(document => document.Number == 42)
    .AsDocumentQuery();
var results = new List<TestDocument>();
while (query.HasMoreResults)
    results.AddRange(await query.ExecuteNextAsync<TestDocument>());

At this point results contains no elements and not one element as expected. The reason is that the LINQ provider does not understand that the property to use in the query is number and not Number.

My second problem occurs when I want to do a partial update of a document. It is possible to avoid a complete deserialization and serializtion of the JSON document by using the following code:

var documentUri = UriFactory.CreateDocumentUri(database.Id, collection.Id, "Alpha");
var response = await client.ReadDocumentAsync(documentUri);
var embedded = response.Resource.GetPropertyValue<EmbeddedDocument>("embedded");
embedded.Text = "Gamma";
response.Resource.SetPropertyValue("embedded", embedded);
await client.ReplaceDocumentAsync(response.Resource);

Notice that only EmbeddedDocument is deserialized and then serialized again after it has been modified.

Unfortunately, the resulting JSON in CosmosDB after the code above has executed is the following:

{
    "id": "Alpha",
    "number": 42,
    "embedded": {
        "Text": "Gamma"
    }
}

Notice how the embedded.text property has changed case to embedded.Text (from camel case to pascal case). The reason is that the Document class which is the value of response.Resource does not use the custom JsonSerializerSettings.

ealsur commented 7 years ago

@Liversage Just to help others debug the issue, which version of the SDK are you using? 1.16.1?

Liversage commented 7 years ago

Sorry, forgot to mention, but yes, I use the latest version which as of this time of writing is 1.16.1.

kirankumarkolli commented 7 years ago

Partial document update: This is an existing feature gap and is under review. One possible work-around is to use stored procedures. Stored procedure can read the document and patch it. This will avoid one extra network call.

Liversage commented 7 years ago

@kirankumarkolli While partial updates certainly would be nice I'm describing a slightly different scenario where the entire document is retrieved, modified and then stored again. However, only part of the document is deserialized before it is modified and then serialized again. All the JSON that is not affected by the update is kept intact. This reduces the CPU and memory required on the client side.

I realize that I use the term "partial update" in my issue. However, I should probably have used "partial deserialization and serialization" instead.

kirankumarkolli commented 7 years ago

non-generic ReadDocumentAsync is somewhat on this path (not fully what you are looking for). This API does a first level de-serialization though.

Liversage commented 7 years ago

@kirankumarkolli Thanks for the response.

Actually, the issue I am reporting is that when I use the non-generic ReadDocumentAsync (as you suggest) and then use ReplaceDocumentAsync to write the response resource returned by ReadDocumentAsync then the CamelCasePropertyNamesContractResolver is no longer used for serialization (second problem in my report).

I understand why CosmosDB has to control the serialization of the root of the document because it contains some CosmosDB specific properties. However, the behavior I experience where a CamelCasePropertyNamesContractResolver only is used in some cases is still surprising and essentially makes it impossible to use it.

bentefay commented 7 years ago

@Liversage Regarding your first problem, I have found that adding JsonProperty attributes to all the properties you wish to query from LINQ causes the SDK to emit the correct query (with 1.17.0). Does this work for you?

public class TestDocument
{
    public string Id { get; set; }
    [JsonProperty("number")]
    public int Number { get; set; }
    public EmbeddedDocument Embedded { get; set; }
}

Obviously this is unnecessarily redundant though, so it would be great to have the SDK actually respect the ContractResolver when transpiling LINQ queries.

kirankumarkolli commented 7 years ago

@Liversage we will investigate ReplaceDocumentAsync not using serialization settings part.

pendragon-andyh commented 7 years ago

Hi Martin

Can we see a sample of your code that demonstrates the problem?

Not sure if this is your issue ... when I was playing with the Document object (in 1.15) I found that the SetPropertyValue method was not respecting the serialization settings - and seemed to make a serialised copy of the object at the point that the method was invoked.

Regards Andy

On 15 Aug 2017 21:47, "Martin Liversage" notifications@github.com wrote:

@kirankumarkolli https://github.com/kirankumarkolli Thanks for the response.

Actually, the issue I am reporting is that when I use the non-generic ReadDocumentAsync (as you suggest) and then use ReplaceDocumentAsync to write the response resource returned by ReadDocumentAsync then the CamelCasePropertyNamesContractResolver is no longer used for serialization (second problem in my report).

I understand why CosmosDB has to control the serialization of the root of the document because it contains some CosmosDB specific properties. However, the behavior I experience where a CamelCasePropertyNamesContract Resolver only is used in some cases is still surprising and essentially makes it impossible to use it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Azure/azure-documentdb-dotnet/issues/317#issuecomment-322584009, or mute the thread https://github.com/notifications/unsubscribe-auth/AF9f5YOcO4xyuy6gAkY8IMs86H09CjDCks5sYgPKgaJpZM4OzS_c .

Liversage commented 7 years ago

@pendragon-andyh My issue contains the code that demonstrates the problem. You can create a console application, add the NuGet package, add the two class definitions and paste the remaining code into the Main method and add some using statements. That's all. Personally, I use LINQPad to do this for even lower friction.

pendragon-andyh commented 7 years ago

Sorry, I didn't see example (using pokey little phone).

Put a break point after the SetPropertyValue call. You will probably see that the Document object has already incorrectly serialised your Text property.

I am not anywhere near a proper computer (so this may be another dead end). Json.net's JObject contains a method that will serialise gamma using a JsonSerializationSetting instance. Try using the result as tge object you pass to SetPropertyValue.

Unfortunately DocumentClient does not expose its settings as a property.

On 17 Aug 2017 07:31, "Martin Liversage" notifications@github.com wrote:

@pendragon-andyh https://github.com/pendragon-andyh My issue contains the code that demonstrates the problem. You can create a console application, add the NuGet package, add the two class definitions and paste the remaining code into the Main method and add some using statements. That's all. Personally, I use LINQPad to do this for even lower friction.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/azure-documentdb-dotnet/issues/317#issuecomment-322981315, or mute the thread https://github.com/notifications/unsubscribe-auth/AF9f5YNDfIDSvocPLHs36c9idc-oJjXkks5sY95XgaJpZM4OzS_c .

Liversage commented 7 years ago

Thank you @pendragon-andyh, that is a neat workaround. I get the desired result if I replace the last chunk of code in my issue with the following slightly modified code:

var documentUri = UriFactory.CreateDocumentUri(database.Id, collection.Id, "Alpha");
var response = await client.ReadDocumentAsync(documentUri);
var embedded = response.Resource.GetPropertyValue<EmbeddedDocument>("embedded");
embedded.Text = "Gamma";
// Workaround to force the serializer.
var jsonSerializer = JsonSerializer.Create(jsonSerializerSettings);
var jObject = JObject.FromObject(embedded, jsonSerializer);
response.Resource.SetPropertyValue("embedded", jObject);
await client.ReplaceDocumentAsync(response.Resource);

I believe that the incorrect serialization that you describe happens in the JsonSerializable.SaveTo method. Calling SetPropertyValue sets the value of the property in the JObject field of JsonSerializable used to represent the JSON document. However, the serialization is only done when it is actually required.

Now, if LINQ queries also worked with camel case without having to put JsonProperty attributes all over the place... 😉

pendragon-andyh commented 7 years ago

Glad to have helped.

I'm interested @Liversage ... does your c# "make a change to the Resource object" perform much better than "fully deserializing the type, making the change, and then serializing the full object"? My gut feel for the API is that it would not be very different.

Also, at a wider level your code is currently doing:

This is 2 round-trips - and has a risk that the object has changed between the read and the write operations. Even if you use the Etag optimistic-concurrency trick there is a risk that you will get a consistency exception and then have to retry.

The API's seems to favour the use of a stored-procedure for this type of operation. In that case the entire read+change+write would happen as a single atomic unit against the write-primary server. This would be 1 network round-trip and has zero chance of a consistency error. My only problems with sprocs are they kind-of go against the whole schema-less thing (because you have to deploy them and consider versioning) and you have to faff around with another language.

Also interested in how much the LINQ thing is a deal-breaker to you? Note: I am trying to write a API over DocumentDB (aiming for "productive, safe, and doesn't kick your POCO's in the nuts"). I am currently using the SqlQuerySpec approach (I'm a Dapper instead of EF guy). For complex queries, people seem to be having a whole heap of grief because of the disparities between LINQ syntax and what Cosmos DB's syntax.

Regards Andy

Liversage commented 7 years ago

@pendragon-andyh, to answer your questions:

[...] does your c# "make a change to the Resource object" perform much better than "fully deserializing the type, making the change, and then serializing the full object"? My gut feel for the API is that it would not be very different.

I haven't measured but I don't think so. I just implemented it like this because it seemed more efficient, but when the NuGet package allowed me to specify JsonSerializerSettings I discovered the issues I am reporting.

[...] has a risk that the object has changed between the read and the write operations

Not in my application. It is actor based and every document is owned by a specific single threaded actor. In general you are right of course.

Also interested in how much the LINQ thing is a deal-breaker to you?

The LINQ problem is a deal breaker because I just want to write and read POCO classes to and from CosmosDB without the hassle of decorating these classes with attributes. My solution is to use pascal case in the JSON. In the future some of this JSON will be served to a browser where the front-end developer will have to live with this formatting.

For complex queries, people seem to be having a whole heap of grief because of the disparities between LINQ syntax and what Cosmos DB's syntax.

My LINQ requirements are simple: I have to be able to filter on and perform ordering based on property values and limit the number of results. I have no problems doing that with the available API.

pendragon-andyh commented 7 years ago

Thanks for replying.

On 17 Aug 2017 10:28, "Martin Liversage" notifications@github.com wrote:

@pendragon-andyh https://github.com/pendragon-andyh, to answer your questions:

[...] does your c# "make a change to the Resource object" perform much better than "fully deserializing the type, making the change, and then serializing the full object"? My gut feel for the API is that it would not be very different.

I haven't measured but I don't think so. I just implemented it like this because it seemed more efficient, but when the NuGet package allowed me to specify JsonSerializerSettings I discovered the issues I am reporting.

[...] has a risk that the object has changed between the read and the write operations

Not in my application. It is actor based and every document is owned by a specific single threaded actor. In general you are right of course.

Also interested in how much the LINQ thing is a deal-breaker to you?

The LINQ problem is a deal breaker because I just want to write and read POCO classes to and from CosmosDB without the hassle of decorating these classes with attributes. My solution is to use pascal case in the JSON. In the future some of this JSON will be served to a browser where the front-end developer will have to live with this formatting.

For complex queries, people seem to be having a whole heap of grief because of the disparities between LINQ syntax and what Cosmos DB's syntax.

My LINQ requirements are simple: I have to be able to filter on and perform ordering based on property values and limit the number of results. I have no problems doing that with the available API.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/azure-documentdb-dotnet/issues/317#issuecomment-323017751, or mute the thread https://github.com/notifications/unsubscribe-auth/AF9f5UtIVwUluBUvIDNFpvbCj8rf83okks5sZAelgaJpZM4OzS_c .

bfarrimo commented 6 years ago

The fact that the LINQ provider doesn't pick up the global JSON serializer settings is an issue, and using the JsonPropertyAttribute is at best a poor workaround with a few caveats as I'll demonstrate below.

In the case of generic methods, they don't always read the attribute that you (or at least I) would I expect:

Scenario 1 Given the call client.CreateDocumentQuery<TestEntity>(uri).Where(x => !x.Deleted) correctly looks at the attribute on my 'TestEntity'.

Scenario 2 Given the following code:

interface IDeletable
{
    bool Deleted { get; set; }
}

void Test<TEntity>(DocumentClient client, Uri uri) where TEntity : IDeletable
{
    client.CreateDocumentQuery<TEntity>(uri).Where(x => !x.Deleted);
}

Test<TEntity>(client, uri);

Here, the API looks for a JsonPropertyAttribute on the Deleted property of the IDeletable interface, rather on the TEntity (in this case TestEntity) type.

Scenario 3 Given the following code:

interface IDeletable
{
    bool Deleted { get; set; }
}

void PredicateTest<TEntity>(DocumentClient client, Uri uri, Expression<Func<TEntity, bool>> predicate) where TEntity : IDeletable
{
    client.CreateDocumentQuery<TEntity>(uri).Where(predicate);
}

PredicateTest<TestEntity>(client, uri, x => !x.Deleted);

The API again correctly reads the JsonPropertyAttribute on the Deleted property of the TestEntity type.

Therefore, the correct workaround currently to use the LINQ provider is to place JsonProperty attributes on each property in your class and any interfaces it inherits from in case it's referred to like that at any point.

This is also required for the Id properties on your documents, as DocumentDb requires a lower case 'id' field and any LINQ queries using said field will not work properly.

I would obviously much prefer to be able to simply set my JSON serializer settings globally (to be camel cased) and then have the LINQ provider respect this in all cases, rather than have to jump through so many hoops and find issues almost every time I do something new with DocumentDb.

jroselle commented 6 years ago

I wanted to add to bfarrimo's comment by mentioning that JsonProperty attribute also does not work on inherited class properties with the override modifier.

justinSelf commented 6 years ago

I'm experiencing this issue with ExecuteStoredProcedureAsync as well. I've setup the client to use the CamelCasePropertyNamesContractResolver but the documents are still being serialized as Pascal.

Using Core Client 1.5.1

bytelabsco commented 6 years ago

In addition to what @justinSelf said regarding stored procedures, if you specify custom serializer settings, any parameters you pass in will be lost by the time the stored procedure executes, and all parameters will have no value.

reddy6ue commented 6 years ago

This decision to use only Pascal case and to not let the user set default serialization settings in DocumentDB seems to be a major design flaw. I've also had the same experience dealing with this quirk of CosmosDB.

The best solution in my experience is to let the users decide which serialization settings they want on any particular collection.

FVilevski commented 6 years ago

Is there going to be fix for this anytime soon ?

Liversage commented 6 years ago

@reddy6ue

This decision to use only Pascal case and to not let the user set default serialization settings in DocumentDB seems to be a major design flaw. I've also had the same experience dealing with this quirk of CosmosDB.

CosmosDB does not prescribe anything about the format of the JSON that you store. This is a problem in .NET SDK that you use to access CosmosDB that has a problem fixing the mismatch between C# conventionally using pascal case and JSON conventionally using camel case.

I have considered not using this SDK and instead writing my own library to access the REST API directly to avoid this serialization issue. However, there is actually a lot going on in the SDK and the effort required is not justified if the goal is to fix a casing issue.

reddy6ue commented 6 years ago

@Liversage You are right. It's not a cosmosdb serialization issue, but the the default Newtonsoft serializer that comes with the azure sdk. I still don't understand why it's so hard to allow users to inject a serializer of their choosing.

creyke commented 6 years ago

I'm using a PascalCase language to write and javascript to read, hence the need to enforce camelCase. Now I can't use Linq. Please prioritize this fix.

moshegutman commented 6 years ago

Yeah this is annoying. My current workaround is to build the query manually.

var parameters = new Microsoft.Azure.Documents.SqlParameterCollection()
    {
        new Microsoft.Azure.Documents.SqlParameter("@number", number)
    };

Microsoft.Azure.Documents.SqlQuerySpec querySpec = new Microsoft.Azure.Documents.SqlQuerySpec("SELECT * FROM root WHERE (root.number <= @number)", parameters);

var query = documentClient.CreateDocumentQuery<TestDocument>(UriFactory.CreateDocumentCollectionUri("test", "testdocuments"), querySpec).AsDocumentQuery();

while (query.HasMoreResults)
{
    var response = await query.ExecuteNextAsync<TestDocument>();
    testDocuments.AddRange(response);
}
scott-lin commented 6 years ago

+1, please prioritize fixing this issue.

tk-isme commented 6 years ago

Bump.

dominikfoldi commented 6 years ago

Any update on this? This feature has to be a basic thing in the SDK...

DanShoubridge commented 6 years ago

bump

https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/31355917-linq-provider-should-respect-jsonserializersetting

WillRock19 commented 6 years ago

Do you guys have any update on the subject? The JSON format still a problem...

mnmr commented 6 years ago

Can we please open-source the SDK so people can fork the code and/or fix this issue without having to reinvent the wheel? It's quite a disgrace that this is still an open issue.

moshegutman commented 6 years ago

@arramac Can we get an update on this?

alexandertsema commented 6 years ago

@arramac any updates?

github-usr-name commented 5 years ago
    // See https://github.com/Azure/azure-cosmosdb-dotnet/issues/317 ... this code is a ridiculously brittle kludge that
    // works for now
    private static IQueryable<T> MakeCamelCase<T>(this IQueryable<T> query, IDocumentClient documentClient, Uri collUri, FeedOptions opts)
    {
        if (query.Expression.NodeType != System.Linq.Expressions.ExpressionType.Constant)
        {
            dynamic sqlQueryObject = JsonConvert.DeserializeObject(query.ToString());
            string sql = sqlQueryObject.query;
    // Regex is actually a private member w/ Compiled flag set to avoid overhead.  Not that it really matters,
    // the guts of MS.DocumentDB.Core use uncached reflection with abandon
            sql = new Regex("\\[\"(.+?)\"\\]").Replace(sql, match => $"[\"{match.Groups[1].Value.ToCamelCase()}\"]");
            query = documentClient.CreateDocumentQuery<T>(collUri, sql, opts);
        }

        return query;
    }
A139008 commented 5 years ago

Please prioritise this fix - causing too many people too much grief

dominikfoldi commented 5 years ago

They says on their forum that this issue is not planned for now, but come on, this is really a problem..

https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/31355917-linq-provider-should-respect-jsonserializersetting

FVilevski commented 5 years ago

@dominikfoldi Well.. lets vote with 3 votes then all of us, so they can see that this thing is really needed.

dominikfoldi commented 5 years ago

@FVilevski I have already done that! I recommend it for everyone else, too!

icelandgrecian commented 5 years ago

This is not helpful approach as it is at the moment and doesn´t sell CosmosDB well for developers. Very confusing that you have this helpful LINQ search provider, but have to add JSONProperty attributes on all the models to make then queryable, especially when the FeedOptions has a global JsonSerializerSettings property.

JordanMarr commented 5 years ago

Any updates on this issue? I'm on my 3rd day of Document DB stuff and lost lots of time on this bug.

JordanMarr commented 5 years ago

FWIW, my work around is slightly different.
I didn't want to put JsonProperty attributes on all my objects, so I just put it on my "Id" properties (otherwise it will add an addition "id" property and my PK checks fail) and let all other properties serialize to PascalCase.

    type Task = { 
        [<JsonProperty("id")>]
        Id: string
        Name: string 
    }

Pros:

Cons:

Liversage commented 5 years ago

@JordanMarr While I cannot speak on behalf of Microsoft I except that all development effort is focused on the v3 version of the SDK and I will be surprised if they fix this issue in the v2 SDK. As far as I can tell there is still no LINQ support in the current v3 preview version but the issue Support LINQ query generation has the tag Required for GA so I'm hoping for the best.

razlani commented 5 years ago

Yawn, this'd be solved pretty fast if they accepted pull requests on an open codebase..

github-usr-name commented 5 years ago

That'd imply embracing open-source workflows... one thing at a time! FWIW, although it's a bit of a PITA at first, I find using a "SQL" query preferable anyway because (1) you can set the capitalisation correctly, (2) you know exactly which bits are going to run where and (3) it avoids the overhead of the LINQ provider. Plenty of disadvantages too, not saying it's the right approach for all use-cases.

arramac commented 5 years ago

@Liversage , @JordanMarr , @razlani - we just added LINQ to SDK v3. Happy to accept PRs there.

Linq in V3 SDK https://github.com/Azure/azure-cosmos-dotnet-v3/pull/209 LINQ V3 Bug fixes https://github.com/Azure/azure-cosmos-dotnet-v3/issues/330

As to this specific topic, the debate we've had is if this change would require supporting all of JsonSerializerSettings (which would require significant work), or if we can support CamelCase in isolation. We're taking a look and will report back.

nhwilly commented 5 years ago

camelCase is the most widely used, I believe.

I am now matching client dtos against server dtos against models and then into CosmoDb on dozens of classes.

Everything is annotated because linq won't work.

A giant pita, fwiw.

sehcheese commented 5 years ago

@arramac What was the outcome of your discussion? Are camelCase serializer settings supported in SDK v3 right now?

nhwilly commented 5 years ago

At this point, I'm tempted to just name all my properties so they start with lower case and end the insanity of JSON attributes (and their typos) everywhere.

Anyone know why that won't bite me later?

j82w commented 5 years ago

@sehcheese and @nhwilly v3 SDK just merged support for CamelCase in Linq. There should be a release in the next few weeks with the change.

nhwilly commented 5 years ago

Ok, that's pretty exciting. Looking forward to deleting a large number of attributes on all my POCOs.