Priam version branching - do we need it?

jasobrown commented 11 years ago

When I took over the Priam project about a year ago, The TuneCassandra class had a lot of if/else branch statements to catch differences in the version of cassandra being used. Unfortunately this was error prone as commented out values in the yaml could cause the alg to not work correctly, as that was used for version detection. Also, the SSTableLoaderWrapper is dependent of the target version of c* being used as it directly compiles against it. Thus, I introduced the current branching strategy in priam to deal with both the yaml changes and c* interface changes.

The current mechanism for c* version detection is hard coded into the priam build. Typically we use the highest c* version at compile time (for example, c* 1.2.5). However, if you are using an earlier minor rev version of c* (say 1.2.1, for example), priam may try to write out any newer yaml values that would apply to the higher version, but would cause the earlier c* version to fail on startup. Thus, the current slution, while better than the previous, if still not a bullet proof as should be.

Living with it for a while now, it seems that perhaps we could have made things simpler overall by not branching the code base, but more intelligently handling the yaml changes in code. I'll add my initial ideas in forthcoming comments to this ticket.

One additional goal of this ticket is to eliminate the direct dependency on c* binaries, as that implies a specific version of c* to be used with priam, which we're trying to get away from. This would work fine for the yaml values, and some other minor code can be reworked to avoid the direct c* dependency, but the SSTableLoaderWrapper is heavily dependent on c* code. I'm not sure how to handle that issue yet (perhaps version-specific subproject(s)?).

Any and all suggestions/comments are welcome.

jasobrown commented 11 years ago

One idea for handling the yaml would be to make it pluggable based on the target version of c* being used.

We could determine the version of c* by reading the version directly from $CASS_HOME/lib/apache-cassandra*.jar, similar to how nodetool version does. (we'd have to be intelligent about determining the correct jar to interrogate, but seems a reasonable path).

Then we have some set of classes that each add/update/remove yaml settings based on the version desired. My initial thought is like this:

A BaseYaml which holds values for everything that has been constant across all version of c* (that we support)
Yaml_x_y - subclass of BaseYaml, an impl of each of the major c* versions (1.1, 1.2, 2.0) that holds values introduced in that major version
Yaml_x_y_z - a sorted collection to be held in each Yaml_x_y containing the incremental changes that occurred within the scope of the accompanying major version. Thus, if an yaml value is added or removed between c* minor version, these smaller implementations would make the appropriate changes.

You can think this idea being a filter chain.

I'm nor sure how far back in history we'd like to go with this idea (all the point revs in 1.1?), but perhaps it's not too difficult (just time consuming).

Thoughts? Thanks

timiblossom commented 11 years ago

+1 for plug-gable yaml handling components based on C* version

Currently we are facing branching issue that branches are too far away from the master. We can only selectively merge back to master some codes, not everything. This causes the merging process to be complex and error-prone. Resolving merging is like doing a one time on if/else inspection.

It would be a good idea to separate Priam out to the common components and plug-gable components. In this way, we can always do a "merge all" on the common components and leave the plug-gable components to live by themselves without any further merging. This would hopefully reduce all if/else inspection steps during branch merging and code reading.

pcn commented 11 years ago

It seems like Yaml_x_y_z accounts for the addition of features/flags/etc. but not for the removal of a flag (let's say 3.1.1 adds feature foo, it's broken, so 3.1.2 removes it, then it's restored in 3.1.6). If you accumulate features through the addition of x_y_z features as z increases, you're going to be in a tough spot.

In our use case, we don't let Priam handle the yaml file - we're on ubuntu, and I create a dummy file for priam to consume, but I write the file based on our chef configs. Have you considered pushing the yaml creation completely out of priam?

jasobrown commented 11 years ago

@timiblossom I'm less concerned with git merging complexity (as that is something we can learn and overcome, but have been not consistent with), than with getting a more understandable/maintainable code base. The branching is not really needed if we structure the code reasonably (esp. wrt the yaml).

@pcn Actually, I would account for removed props, but perhaps my explanation didn't convey that well enough. I imagine a map of the yaml values being passed to the x_y_z impls to which they could add/remove/modify values

To clraify a little more, what I propose is for each 'major' c* version, we have a set of x_y_z guys that handle the incremental yaml changes. Thus, for example, we'd have impls for 1.2.1, 1.2.3, 1.2.5, and so on. Then each time we go to write the yaml, we'd run though all the x_y_z impls for the major c* rev being run (up to the currently executing c* minor rev, but not further). Thus, if somebody's running 1.2.4, we'd execute 1.2.1 and 1.2.3 impls (in order!), but not the 1.2.5.

Let me know if this clarifies the proposal, else I'll draft some code to illustrate and we can pick holes in that :).

As to your proposal about not having priam manage the yaml at all, yes, I have thought about it (a lot). We at Netflix rely heavily on it, however that doesn't mean others need to be bound to it. Please file a priam ticket to make it a configurable option (I have an idea or two about how to implement it).

jasobrown commented 11 years ago

As an experiment, I commented out the cassandra-all dep from the build.gradle to see how many errors occurred during compilation. Two major categories: SSTableLaderWrapper, and anything related to nodetool. It is possible to move them into separate, per-c* major version subprojects. But it's not as small an amount of code as I thought it might be.

jasobrown commented 11 years ago

I've created a very loose (and probably uncompileable) code on the branch referenced in the previous comment. There's a few classes in the priam/src/main/java/com/netflix/priam/yaml/ directory that sketch out what I'm thinking. Note: I put that code together in under ten minutes, so it's far from elegant/complete, but should be enough to demonstrate the basic path.

Please comment if it's worthwhile, or a whole new approach is needed.

timiblossom commented 11 years ago

Looking at the sample code, it has both adding/removing for each major or minor version so it looks flexible. Based on this, I think Priam can also bind YamlGenerator dynamically by looking at C* version from a configuration source. The only disadvantage is that we have to maintain a hierarchy of generator classes but I guess it is worth the tradeoff.

jasobrown commented 11 years ago

Agreed about needing to build the hierarchy of classes being a slight downside.

Thinking about this a bit more, we'll have a bit more fun when integrating with Datastax Enterprise. They've backported some stuff from 1.2 (to 1.1.x) as well as adding in a few extra c*.yaml settings. Will have to think about how to integrate that here, as well.

jasobrown commented 11 years ago

I've decided that while the the properties stuff may be easier to manage in code, the other pieces like nodetool commands and any compilation dependencies on c* will get too weird to introduce in separate submodules. Thus, we'll keep things the way they are and continue on with the current branching.

Netflix / Priam

Priam version branching - do we need it? #218