bchavez / RethinkDb.Driver

:headphones: A NoSQL C#/.NET RethinkDB database driver with 100% ReQL API coverage.
http://rethinkdb.com/api/java
Other
384 stars 134 forks source link

JObject roundtrip issue #39

Closed oliverjanik closed 8 years ago

oliverjanik commented 8 years ago

As promised here's a test that fails JObject roundtrip failing test. Just drop it into the GitHubIssues.cs

        [Test]
        public void issue_39()
        {
            var dateTime = DateTime.Parse("2016-09-03T10:30:20Z");
            var obj = new JObject(new JProperty("timestamp", dateTime));

            var table = R.Db(DbName).Table(TableName);
            table.Delete().Run(conn);

            var result = table.Insert(obj).RunResult(conn);
            var id = result.GeneratedKeys[0];

            var check = table.Get(id).RunAtom<JObject>(conn);
            check.Dump();

            var dt = (DateTime)check["timestamp"];
            dt.Should().Be(dateTime);
        }

There's the output of check.Dump():

{
  "id": "e605a378-d8f9-4d0d-b910-e4707c0ce6b0",
  "timestamp": {
    "$reql_type$": "TIME",
    "epoch_time": 1472898620,
    "timezone": "+10:00"
  }
}
bchavez commented 8 years ago

Hey Oliver,

Thanks for taking the time to report this. I appreciate it.

I think what's happening here is when you specify RunAtom<JObject> the driver passes JObject to Newtonsoft. This causes Newtonsoft totally bypasses our converters because it doesn't need them for converting a JObject. So, check remains a JObject pseudo type. And :boom: goes the exception. Using a JObject is like manual mode. There's a blurb about raw types in the documentation Raw Format Options wiki page here. I updated a note about it under Extra C# Features:JObject Support wiki page too.

If you want to pull this DateTime out of the JObject pseudo type, you'll need to do something like this:

var dt = check["timestamp"].ToObject<DateTime>(Net.Converter.Serializer);

Let me know if this helps.

PS. I'm going on off-grid tomorrow morning. So I won't be back for a couple of weeks. Also, I won't be around the Internet very much during my time off.

oliverjanik commented 8 years ago

It's good that you have that documented, but it doesn't help with my use-case. RethinkDB is a JSON store. All I'm doing is providing a thin HTTP wrapper around it. PUT saves json GET retrieves it back. As it stands I haven't found a good way to achieve non-destructive round-trip. Particularly when DateTimes are involved.

I think the driver should attempt and interpret the datums as native types if possible. If it comes accross $reql_type$: "TIME" it will convert that into a DateTimeOffset. The users of the library should be blissfully unaware of the underlying data model.

oliverjanik commented 8 years ago

Looks like I'll need to to a post-processing pass on the JObjects. I can't have POCOs, unfortunately.

The problem stems from impedance mismatch between JSON and native REQL representation. JSON does not have advanced types like DateTime.

What happens here is you get value like 2016-09-03T10:30:20Z on the wire in JSON. JSON.NET interprets this as DateTime object for some reason then Reql driver saves it as native TIME type.

On the way back ReqlDriver makes no interpretation $reql_type$ and I get what's above.

I'm wondering what the solution is here? Should Reql Driver have a mode where it interprets $reql_time$ or should it have hooks for custom serialization? Or both?

bchavez commented 8 years ago

Hi @oliverjanik,

So I've been sort of avoiding this particular issue because of the complexities involved. But I guess it's time to discuss the music. :musical_keyboard: :smile_cat:

What happens here is you get value like 2016-09-03T10:30:20Z on the wire in JSON. JSON.NET interprets this as DateTime object for some reason then Reql driver saves it as native TIME type.

In particular, I need some context, from what perspective?

App -> Driver -> Wire -> RethinkDB or RethinkDB -> Wire -> Driver -> App?

In both cases, 2016-09-03T10:30:20Z is $reql_type$:TIME over the wire to and from RethinkDB server.

Now, as you've described, the problem arises when you have requested RunHelper<JObject> where T is JObject. Currently, we immediately take the wire JSON and feed it to Newtonsoft by saying something like JsonConvert.DeseralizeObject<JObject>(wire_json, UseOurSeralizer).

And here's where it gets tricky. Since JObject is a Newtonsoft special token type, I am not sure our UseOurSeralizer with $reql_type$ converters are considered during deserialization. My gut feeling here is I think Newtonsoft does not fully parses the response. It's almost like Newtonsoft lazy-loads the wire JSON only up to the first JObject token. And by this time, we've already returned JObject to the user. So how would our $reql_type$ converters ever get kicked into gear when Newtonsoft only parses the very first token of the response (not the entirety of the response)?

So far, the only solutions I can really think of are:

Solution A One solution is to have a special test inside all the Run* methods to test for T is JToken, then run the entirety of the response through a 2nd stage converter (like Solution B). Just feels ugly and hackish, and doesn't feel right. :cry:

Solution B Give the user a conversion utility helper? Originally back in November, we had a more proactive converter. See this commit. whereby all the $reql_type$ tokens were replaced using JSON Path. We could possibly bring this back and expose it to the user. Again, just feels ugly, hackish and doesn't feel right. :cry:

So I could be completely wrong since it's so very late here... but I think this is fundamentally what is happening here and I don't think there's an easy answer for this.

oliverjanik commented 8 years ago

Sorry, I wasn't clear enough. The flow of data is like this:

Client -> Wire -> MyApi -> Driver -> RethinkDB

In MyApi I receive data using WebAPI controller typed as JObject. This is where the conversion from 2016-09-03T10:30:20Z to DateTime happens for some reasons. I don't think this is good behaviour but it is what it is.

I pass this JObject without any modifications to the Driver which converts DateTime to $reql_type$:TIME. This is correct behaviour.

On the way back I have this:

RethinkDB -> Driver -> MyApi -> Wire -> Client

I grab my object from the Driver by id typed as JObject and I don't get the DateTime back. This is where I think the interpretation should happen so I would get DateTime back.

Then I return this JObject as response from my controller.

Result:

The net result is that from Client's perspective they don't get what they stored within my system.

oliverjanik commented 8 years ago

I don't think the driver should treat JObject any different to other objects. That is implementation detail.

I do believe you need a 2 stage parser. If I ask for IDictionary<string, object> and one of the values is $reql_type$:TIME, what do I get back? I would expect DateTime.

bchavez commented 8 years ago

Unfortunately, after reviewing some Newtonsoft source it looks like we would need a second stage parser specifically for JObject to resolve this impedance mismatch. The JsonReader is non-cached, forward-only parser. :cactus: :fire:

The only reasonable alternative it seems is to bring back our JSON Path converter. Use JSON Path to iterate over the JObject and pluck out all the $reql_type$ pseudo types and replace them appropriately.

I'll need a few days to deal with this since this is a major breaking change and would mean we'd need to support raw options again.

bchavez commented 8 years ago

Also, @oliverjanik , after exploring your specific issue more from

Client -> Wire -> MyApi

You can configure Newtonsoft to parse JSON date / times as DateTimeOffsets with the following

var json = @"{
                ""name"":""hello kitty"",
                ""dob"":""2016-09-03T10:30:20Z""
              }";

var settings = new JsonSerializerSettings
    {
        DateParseHandling = DateParseHandling.DateTimeOffset
    };

var j = JsonConvert.DeserializeObject<JObject>(json, settings);
// now, internally j["dob"] should be DateTimeOffset, no longer DateTime.
oliverjanik commented 8 years ago

Thanks for the DateParseHandling tip.

If I set it to DateParseHandling.None my timestamp should stay a string and that would avoid this roundtrip issue. It would prevent me from running time functions on the database though.

bchavez commented 8 years ago

Hi @oliverjanik , 2.3.1-beta-2 should have this fix now. All JToken return types should have their underlying pseudo types converted by default. If you still want to retain old raw type behavior, you'll need to specify .Run(conn, new { time_format="raw"}). Let me know how it goes. Thanks!

https://www.nuget.org/packages/RethinkDb.Driver/2.3.1-beta-2

-Brian

oliverjanik commented 8 years ago

Thanks heaps, you're a legend.

oliverjanik commented 8 years ago

@bchavez I've just upgraded to 2.3.3 stable and I still see this:

{
  "_createdBy": "oliver",
  "_storedAt": {
    "$reql_type$": "TIME",
    "epoch_time": 1464077642.148,
    "timezone": "+00:00"
  },
  "id": "03233075-1c5b-4df5-ba2e-50de71f58949",
  "stamp": "2016-05-24T17:32:03Z",
  "test": "hello!"
}

Is there anything I need to configure?

oliverjanik commented 8 years ago

Alright I had RunAtomAsync<object>(m.Conn) in the code and I changed it to RunAtomAsync<JObject>(m.Conn) and it looks ok apart from #64

The abstraction leaks. :-)