basho / basho_docs

Basho Products Documentation
http://docs.basho.com
Other
169 stars 191 forks source link

Downgrading from Riak 1.4 #478

Closed coderoshi closed 10 years ago

coderoshi commented 11 years ago

I've noticed that we don't have straightforward documentation on downgrading from Riak 1.4. Yet, there are a few things in Riak 1.4 that make downgrading a bit more involved than just re-installing the 1.3 package.

I figured I'd send out this email to track downgrade-related topics in hopes of 1) getting a definitive list together, 2) informing CSEs about all the necessary steps and 3) hopefully setting the stage for someone to turn this into a doc on docs.basho.com

To my knowledge the following are things to keep in mind when downgrading from Riak 1.4

1 LevelDB change to allow level-1 SSTs to have overlapping ranges

This changes how LevelDB organizes data in Riak 1.4, in a way that's not compatible with 1.3. To my knowledge, data must be converted back into the old form when downgrading. The steps I've been told (from MvM) are as follows:

  1. Stop 1.4 node
  2. Move all .sst files in sst_1 directories of LevelDB vnodes into sst_0 directories
  3. Install 1.3 package but don't start the node
  4. Run repair on each vnode outside of Riak (see below)
  5. Start up Riak 1.3

To do the "run repair step" there's the option of compiling the "leveldb_repair" tool from the LevelDB 1.3 source and using that (I don't believe we ship the tool included in Riak). Or, triggering repair from an Erlang shell that is separate from Riak itself. I believe this gist from Sparrow is the current state-of-the-art advice from CSEs on how to do repairs: https://gist.github.com/bsparrow435/2834473

In general, this is all really ugly. We should get an escript together that makes this easier. But, step 1 is making sure we're all aware if this process + documenting it.

2 Downgrading binary object format

The new binary object format is disabled by default when upgrading (provided users keep their 1.3 app.config and do not deploy the new 1.4 settings -- Chef/Puppet users often mess this up and may end up with enabled on accident). However, users that have enabled the new object format must downgrade. Luckily, this is well documented in the release notes:

https://github.com/basho/riak/blob/1.4/RELEASE-NOTES.md#improved-binary-format

3 New PB/HTTP vector clock encoding change

This is likely not to be an issue, but listing for completeness. 1.4 supports not using zlib for encoding vclocks sent over PB/HTTP to clients. However, 1.4 stil defaults to the existing zlib encoding, and we haven't documented how people can change that. So, it's unclear if people will change this in the wild. We didn't get enough testing in during the cycle to really recommend the change/etc

However, if someone does change to the new "raw" encoding format, they should change back to the "zlib" encoding and wait 10-20 minutes before downgrading. This is to ensure that there are no client requests in flight where a client retrieved a vclock from Riak that was in the new encoding, which it later sends back to Riak after we've downgraded to code that doesn't understand the new encoding. In any case, worse case here is that a few client request fail then things return to normal. So, not end of the world if people miss this.

Is there anything else I'm missing here about downgrading? If so, please respond so we have a single thread with "everything the world needs to know about downgrading 1.4" so we can work on educating users / adding documentation.

@jtuple

seancribbs commented 11 years ago

Minor change in config for multiple PB listeners vs. previous single-listener format.

%% 1.4
{riak_api, [
    {pb, [{"127.0.0.1", 8098}]}
]}.

%% 1.3
{riak_api, [
    {pb_port, 8098},
    {pb_ip, "127.0.0.1"}
]}.
lucperkins commented 10 years ago

We now have a rolling downgrades doc: http://docs.basho.com/riak/latest/ops/upgrading/rolling-downgrades/

Closing.