dominictarr / scuttlebutt

peer-to-peer replicatable data structure
Other
1.32k stars 66 forks source link

Scuttlebutt#clone requires your scuttlebutts use constructor functions, don't take arguments or require modification on an instance #30

Open jfhbrook opened 10 years ago

jfhbrook commented 10 years ago

Here's the important code in Scuttlebutt#clone, which actually creates and sets up the cloned object:

var A = this
var B = new (A.constructor)
B.setId(A.id) //same id. think this will work...

To explain why this is restrictive, I'm going to show a few examples (sketches, really) and use them to illustrate how Scuttlebutt#clone breaks. They're based on a number of refactors I tried with expiry-model to get it to play nice with level-scuttlebutt which appears to call Scuttlebutt#clone internally.

So, first, expiry-model doesn't use constructors, because Jake likes to fuck with me and write things that break expectations:

var createSomeModel = function (opts) {
  var scuttlebutt = new Scuttlebutt(),
      coolThing = thing(opts);

  // ( Define some more functions and variables which use
  // our scuttlebutt and our coolThing )

  // Implements all the methods required of a scuttlebutt
  return {
    get: get,
    set: set,
    /* . . . */
  };
};

This breaks because the constructor for the object we're returning is, well, Object. Note that returning scuttlebutt does give you the Scuttlebutt constructor, but that it still won't have the same methods as the object returned by our factory thing.

So what happens if you refactor it to use a constructor? Let's see:

var SomeModel = function (opts) {
  this.coolThing = coolThing(opts);
};
util.inherits(SomeModel, Scuttlebutt);

SomeModel.prototype.get = get;
SomeModel.prototype.set = set;
/* . . . */

This breaks because Scuttlebutt#clone has no way of knowing what arguments were passed into the original. In the second instance, this.coolThing gets nothing passed to it, so your copies have different properties. Oops?

What if you refactor so that you have to set properties on SomeModel by-hand? Like, someModel = new SomeModel(); someModel.coolThing = coolThing(opts); Still breaks, for similar reasons as the case where we don't use a constructor: Scuttlebutt#clone has no way of knowing what modifications you made to the original object after being instantiated, so the clone doesn't have them applied to it.

Back to expiry-model: expiry-model uses lru-cache to manage data storage and expiration, and scuttlebutt to handle events and streaming. That means that you have to be able to configure it somehow, since every case of using a lru cache needs different parameters. Since passing options and setting properties both break, all you can really do is use .get() and .set() on your scuttlebutt, hope you don't have weird collisions, and beat the crap out of your libraries until they're smart enough to reconfigure lru-cache properly each time you do a .set(). But, in the case of expiry-model, the configs would, well, expire, without extra special handling. Oops. XD

"But Josh what if you just don't use clone? Jake never used clone so he never ran into that issue!", you say, in the hypothetical situation where I get to be my own Mary Sue. The thing is, you might not be using clone, but other libraries which apply to general Scuttlebutt instances may use clone, and in fact given that it's a core method it's not really realistic to expect people NOT to. Plus, the ability to do it is pretty handy.

For my use case, I realized that level-replicate was the library I actually wanted. But hey, food for thought.

dominictarr commented 10 years ago

thank you. this level of detail really helps me understand the nature of the problem. and the colourful language also adds interest without distracting from the technical details.

Since you decided to use level-replicate instead, I'll probably just leave this issue open, but I will explain the purpose of clone, and how this problem may be addressed if someone encounters it again in future.

I added clone when building level-scuttlebutt. With leveldb you can store thousands of scuttlebutt instances, but if kept them all in memory it would be a leak. so, level-scuttlebutt needed a way to dispose of scuttlebutt instances, but bring them into memory as well.

You can call .dispose() on a scuttlebutt and it destroys all streams, and level-scuttlebutt will stop tracking it also. When you do scuttlebuttBb.open('foo') it creates a scuttlebutt attached to the database, and then clones it, so that the user does not directly touch the databases instance.

if more instances are opened to stream to clients, level-scuttlebutt creates a new clone for each, and disposes of them when that instance is disposed of, or when all it's streams are ended. if all the clones are removed, then the original instance is disposed of, and removed from memory.

Either level-scuttlebutt, expiry-model or both will need to be refactored to handle this case.

This is also a question about what you are removing from memory.

the problem with using lru-cache with scuttlebutt is that it's non-deterministic. if you do lots of em.get('foo') it will be kept in the cache, but maybe not on another.

It would be more predictable to dispose of a key after a timeout from it's write time, which is recorded in the history, and all instances could make the same decision.

that would still require some configuration (what maxAge), though. I think the simplest way to handle that is to wrap 'expiry-model' in a thing that had a proper constructor but kept the options in a closure.

module.exports = function createEm (opts) {
  function emWrapper ()
    //wraps em in some way...
  }
  return new emWrapper()
}

or maybe level-scuttlebutt could be aware of a factory method and use that to clone instead. or expiry-model could wrap the clone method in something that worked right, and preserved options. scuttlebutt could be refactored to have a clone and a _clone method to make this easier. _clone would handle everything after the instance is created, allowing the author to create a subclass that is created in their own weird way.

jfhbrook commented 10 years ago

Yeah, how I got around this, kind of, was to make a constructor where properties were read off the object, and then do something like:

var SpecialCase = function () {
  GeneralCase.call(this);
  this.someProperty = 'some non-default value';
}
util.inherits(SpecialCase, GeneralCase);

It works but ain't pretty.

jfhbrook commented 10 years ago

I added clone when building level-scuttlebutt. With leveldb you can store thousands of scuttlebutt instances, but if kept them all in memory it would be a leak. so, level-scuttlebutt needed a way to dispose of scuttlebutt instances, but bring them into memory as well.

I would document that better if I were you. This is the major reason I realized level-scuttlebutt was a bad fit for what I was doing: My scuttlebutts were going to be relatively monolithic, always-open and very large, such that all my shit would necessarily be in-memory.

or expiry-model could wrap the clone method in something that worked right

That might work! But again, you'd have to document how/why you would do this somewhere.

jfhbrook commented 10 years ago

the problem with using lru-cache with scuttlebutt is that it's non-deterministic. if you do lots of em.get('foo') it will be kept in the cache, but maybe not on another.

It would be more predictable to dispose of a key after a timeout from it's write time, which is recorded in the history, and all instances could make the same decision.

Yeah, good point. Perhaps a separate issue for expiry-model?

dominictarr commented 10 years ago

hmm, yes I think so. one solution might be to have expiry-model automatically reinsert old keys that are being read a lot. that would freshen their timestamp, however, since you are probably using expiry model for something like chat, it is probably good enough (considering user's expectations) to just expire keys by their age.

lru-cache on the other hand, is not about user expectations. It's about efficient resource use. Also, lru-cache expects that you have another more reliable resource behind it, for when the cache misses. So, it's probably not sensible to use in the way expiry model is using it.

/cc @Raynos