Change the way redis_client deserializes empty strings.

enyo commented 11 years ago

In the current version, redis_client deserializes empty values to null.

Since redis doesn't support null values by itself that's all empty Strings are thus converted to null.

I propose to change this behaviour so null and "" (empty String) both deserialize to "" (empty String). I think it prevents errors, and theoretically redis always stores an empty String, not a null value.

@dartist/owners what do you say?

mythz commented 11 years ago

I believe they're both stored in Redis as a bytes[0] so I guess that could be interpreted as null or an empty string :) But honestly I prefer using null as a marker for empty on values (and a 0-sized collection for empty collections), it simplifies checking since it can apply to all data types and you don't need to do 2 checks (i.e. null and "") to check for empty.

Is there a use-case/example that shows an issue of using null?

enyo commented 11 years ago

Well.. if you simply want to store a String, and don't check if it's empty yourself and read it again, you'll encounter a null pointer exception. So everytime you store a String you have to check if it's not empty.

I've been thinking about the whole serializer, and I'm wondering if it wouldn't make more sense to provide the serializer function to serialize objects, instead of doing it automatically. Right now, the JsonEncoder actually fails if you store a String like [ my String because it thinks it's JSON.

What if I actually want to store a String like { "myAwesome": "String" }? What if somebody chooses this as the username....

enyo commented 11 years ago

I can't think of a use case, where I don't know that I wanted to store a map (or something else serialized) and I think it's a lot more safe and less confusing if the calls where like this then:

client.set("mykey", serialize(myData));
var value = deserialize(client.get("myKey"));

mythz commented 11 years ago

Sorry for the delay, been held up in meetings all day...

Well.. if you simply want to store a String, and don't check if it's empty yourself and read it again, you'll encounter a null pointer exception. So everytime you store a String you have to check if it's not empty.

Not sure which friction point you're referring to. If we follow the SET operation down we see that the value gets delegated to the underlying raw native client and value gets passed to toBytes:

Future set(String key, Object value) => client.set(key, toBytes(value));

toBytes should probably return an empty List<int> for both "" and null here, it doesn't yet so this should be changed.

The native client delegates down sendExpectSuccess:

Future set(String key, List<int> value) =>
    conn.sendExpectSuccess([_Cmd.SET, keyBytes(key), value]);

If key is null or an empty string keyBytes throws an error as expected.

What if I actually want to store a String like { "myAwesome": "String" }? What if somebody chooses this as the username....

We need to make a decision between convenience and purity here. In order to provide a high-level API most objects are serialized as JSON, it seems wrong to do this for strings since this would be a client library opinion to JSON encode every string, which I doubt other redis bindings would be doing so JSON encoding strings would reduce interoperability.

The way to get it out as a raw string would be to use the Native Client API, e.g:

client.raw.get(key).then((bytes) { ... });

But this wouldn't be symmetrical to how they're set so it's a little inconsistent, so maybe what we need to do is publish string-specific getString/setString/etc APIs on redis string operations that just stores/retrieves raw strings?

I think it's a lot more safe and less confusing if the calls where like this then:

client.set("mykey", serialize(myData));
var value = deserialize(client.get("myKey"));

In a way the RedisNativeClient lets you do this, i.e. it doesn't add any encoding itself. IMO the RedisClient should ideally provide a convenience wrapper where this boilerplate shouldn't be necessary.

enyo commented 11 years ago

Not sure which friction point you're referring to.

At the moment, the behaviour is:

client.set("mykey", "");
var myString = client.get("mykey");
assert(myString == null);

Since null and "" are both serialized as an empty byte list, and deserialized as null.

In my opinion, client.set and client.get should always return the same value. If you "magically" convert objects to redis strings, there will always be a problem when deserializing the data (unless everything goes through JSON).

What you're saying, is that all user entered data (and Strings in general) must be handled with the RedisNativeClient since you can never know if the String provided isn't actually a JSON object.

IMO this will lead to problems because most of the users won't be aware of this problem and it will result in Maps, null, Lists and ints returned when a String is expected (which, at worst, will crash the program).

At the same time, most of the other functionality (APPEND, STRLEN, etc...) doesn't make much sense in this high level API, since you can never know what the client actually stores. We know that at the moment Strings aren't converted behind the scenes. How could I, as a user, know that? Since lots of other content is converted magically there's no certainty about it.

My proposal is to introduce the high level functions:

setObject(key, object) and
getObject(key)

Since the other methods (INCR, INCRBY, APPEND, etc...) only work on Strings (ints), there is no need to provide such a high level functionality for those.

We need to make a decision between convenience and purity here.

IMO unpredictability is the wrong decision.

mythz commented 11 years ago

At the moment, the behaviour is: assert(myString == null);

I believe this is the correct behavior. The api takes an Object and there's no difference in what gets sent over the wire between "" and null as there's no callsite generic type information available we don't know what the callsite value should be interpreted as, and the correct value for empty for all objects is null. Redis's philosophy is to return the same value for empty if the key exists with no value or the key doesn't exist (e.g. returning an empty collection when the key doesn't exist), so returning "" when the key doesn't exist here feels wrong.

In my opinion, client.set and client.get should always return the same value.

What about serializing and deserializing other data types e.g. a List, Map or boolean - should we just return strings in all cases?

My proposal is to introduce the high level functions:

There's a lot more than 2 methods you would have to add, e.g. every method containing Object value in RedisClient.dart would need an overload.

As I prefer to use a high-level client that saves boilerplate when using Redis as an Object data store, I think instead of creating mismatched overloads to fit in all in one client we should look at creating a new client (inheriting form RedisNativeClient) that just deals with unencoded Strings. So we could rename this client to RedisObjectClient and have the String-based client called RedisClient. This way users don't have to look at the implementation to infer the behavior of each method.

There could also be an convenience property on RedisClient to access the high-level RedisObjectClient, e.g:

client.objects.set("aMap", { 'key': 'value' });

enyo commented 11 years ago

Redis's philosophy is to return the same value for empty if the key exists with no value or the key doesn't exist (e.g. returning an empty collection when the key doesn't exist), so returning "" when the key doesn't exist here feels wrong.

That is incorrect.

Redis returns a (nil) Bulk reply ($-1) when a key doesn't exist. When you set an empty String, redis returns "" as expected.

Since the current implementation always returns null even when there is actually an empty string, you don't know if the key existed or not.

In my opinion, client.set and client.get should always return the same value.

What about serializing and deserializing other data types e.g. a List, Map or boolean - should we just return strings in all cases?

Well, that is exactly my point: when you set objects, you know that they are going to be serialized. The most obvious solution is of course client.set("key", serialize(object)). I would then expect:

client.set("key", serialize(object));
assert(client.get("key") == serialize(object));
// but this can't be guaranteed:
assert(object == deserialize(client.get("key"))); // Can potentially fail.

There's a lot more than 2 methods you would have to add, e.g. every method containing Object value in RedisClient.dart would need an overload.

That's true. But there aren't that many other methods (about 5) because all the integer commands can be skipped.

I think I'd probably go for a named argument like: client.set("key", object, serialize: true); This way it doesn't pollute the API.

As I prefer to use a high-level client that saves boilerplate when using Redis as an Object data store, I think instead of creating mismatched overloads to fit in all in one client we should look at creating a new client (inheriting form RedisNativeClient) that just deals with unencoded Strings.

Well. I agree with you that boilerplate code should be handled by the library. I just think that doing magic, behind the scenes, that results in unexpected behaviour is a bad solution and can even lead to serious app instabilities and vulnerabilities. Storing data inside Redis is a misuse of redis. Redis is meant to store strings. The fact that you can store objects, serialized as Strings in Redis is just a nice addition, but the intent should be clear when used in a program. So I think that using client.set("key", object, serialize: true) or even client.set("key", serialize(object)) is not that much boilerplate code for the benefit of completely predictable and clean behaviour.

mythz commented 11 years ago

That is incorrect.

Well it applies to collections at least:

If an operation targeting an aggregate data type (list,set,zset,hash) is performed against a non existing key, the behavior should be exactly the one obtained running the operation against an empty aggregate value of the same type. So for instance LLEN returns 0 if called against a non existing key, because we consider it holding an empty list. -- http://oldblog.antirez.com/post/redis-weekly-update-4.html

So what should happen when you try to SET a null string? Is it no longer supported?

Requiring serialize() all the time is redundant and users will always need to determine that when they're not storing a string it needs to be serialized. Adding an optional serialize: true still pollutes the API space with mixed level of abstraction which I don't think it's an improvement since it's not symmetrical when deserializing.

I still prefer having a separate high-level RedisObjectClient which allows you to drop in a custom encoder letting you globally change how Objects are serialized without affecting client call-site code. Why don't you change the existing RedisClient to remove all use of serialization and use all strings (which you could also use to client.set("key", serialize(object))) and I'll work on a separate new RedisObjectClient which always serializes/encodes values, including strings.

adam-singer commented 11 years ago

I like this compromise, allowing for different encoders is a good solution

enyo commented 11 years ago

A client like RedisObjectClient sounds reasonable, but do you agree that in that case, it should always encode the data with JSON (or any other serialized format)? IMO, if a high level "object" client stores a String, it should be stored as "my string", not my string. There shouldn't be any ambiguity about the content, and testing if the data has actually be serialized can be removed from the library.

mythz commented 11 years ago

Yep, will change it so values are always encoded so strings with a JsonEncoder is stored as "my string".

enyo commented 11 years ago

So, the RedisObjectClient is basically just a wrapper for the RedisClient right?

enyo commented 11 years ago

Would you extend the RedisClient class?

Basically, the only methods that would make sense in such a wrapper object are the methods that (de)serialize data, right? So extending the RedisClient is probably not that good of an idea...

What about a RedisObjectWrapper abstract base class, which can be implemented by the RedisJsonWrapper. It would instantiate a RedisClient internally (by forwarding the connect() calls) and define the basic methods used for serialized objects. If a user wants to use the native String commands, they could be available through the client getter?

What are your thoughts?

mythz commented 11 years ago

Nope, I don't see RedisClient and RedisObjectClient sharing any class hierarchy as we wouldn't extend be able to extend the string RedisClient and keep the right overloads, also wouldn't work if we wanted to use a binary encoder like MsgPack. It would just be similar just as RedisClient is now, e.g. delegating serialized values to RedisNativeClient.

It would just an extra file (like RedisClient.dart) so not worried about trying to force impl sharing, and I expect the hierarchy to just look like:

RedisNativeClient (bytes)
   +  RedisClient (strings)
   +  RedisObjectClient (objects)

To access one from the other we could expose getter properties like:

new RedisClient()
  ..objects.set("aMap", { 'key': 'value' });

and

new RedisObjectClient()
  ..strings.set("aRawString", "value");

enyo commented 11 years ago

The RedisObjectClient could also simply use the RedisClient since it just wants to store Strings anyways

mythz commented 11 years ago

The BytesEncoder returns bytes which should go straight to RedisNativeClient.

mythz commented 11 years ago

Hi Matias, if you've got any uncommitted changes locally can you commit them in the next few days, as I might have some time free soon to pick up anything that's left to do.

enyo commented 11 years ago

@mythz Hi, will do

bbss commented 10 years ago

I somehow hadn't seen this issue until a couple of days ago, but I think I've got most of your concerns covered.

My main concern was not having the library impose a security threat when deserializing as what @enyo mentioned what would happen when something that is not a string would be deserialized as an instance. That's why I did not write a recursive method for deserialization. Luckily I found the only places I have to deserialize anything beyond one level of depth is the places where Redis already provides an abstraction via its datatypes already; the lists, sets and hashes.

I was thinking about what to do with custom classes and thinking about a similar approach to what you did before with DateTimes (prepending "Date(" and appending ")").

I kept that date (de-)serialization in but might remove that too and leave it all to the user, since if we want to provide deserialization options beyond that abstraction we would be have to make users aware of how this information is stored since other functionality Redis offers (like operations on strings) might behave unexpectedly when they are prepended with type information.

Now we could make our library store type information in a separate hash or something, but then we would be making an ORM (which would be pretty cool to me) but looking through the rest of the Redis API's listed on www.redis.io/clients ORM's get special mention below the clients.

dartist / redis_client

Change the way redis_client deserializes empty strings. #24