3rd-Eden / memcached

A fully featured Memcached client build on top of Node.js. Build with scaling in mind so it will support Memcached clusters and consistent hashing.
MIT License
1.31k stars 276 forks source link

Smart sanitization of keys #235

Open matthewwithanm opened 9 years ago

matthewwithanm commented 9 years ago

Length isn't the only restriction that memcached placed on keys; they also can't contains spaces or control characters.

It would be great if there were an option (keySanitization?) with behavior similar to keyCompression for automatically handling the invalid keys. I can submit a PR if this sounds like a good approach.

3rd-Eden commented 9 years ago

I guess it would be nice to have, but the question is what should be done once we see those chars? Just remove them?

matthewwithanm commented 9 years ago

Usually libraries will hash the key, or combine a hash of the key with an abridged/cleaned version of the key.

3rd-Eden commented 9 years ago

The problem here is that we want to have some sort of compatibility with other libraries, if the majority of libraries is doing hashing, we should do to. If they are removing all the things than thats the pattern we should follow. But I don't know nearly enough about key sanitization in other libraries about this to have an opinion on these things.

matthewwithanm commented 9 years ago

I don't think there's a standard. It's pretty much in exactly the same boat as keys > 250 chars—using md5 hashing for compression seems like a reasonable decision but it's not necessarily what other, non-node libraries will do.

My personal preference is (in both the too-long case and the bad-chars case) to try to maintain meaningful keys by appending a hash of the entire key to a sanitized and truncated version. (I took this approach in django-imagekit). However, just using a hash is fine too. Although—not to get too off topic—I'm realizing that this approach (the one currently used for keyCompression) opens the door for (albeit unlikely) collisions where the user may actually want to use the same hash that node-memcached has generated as a key for something else. Ideally, compressed keys wouldn't be in the same space as plain user keys. That is, any keys that node-memcached builds via transformation would themselves be transformed if supplied directly by the user.