Closed systream closed 1 year ago
I forgot to mention that we did "in place" upgrade. Data files kept, but riak upgraded to 3.0.12. It is a test environment, but we plan to upgrade in prod too.
So there was an update done to snappy as part of 3.0.12 - https://github.com/basho/eleveldb/releases/tag/riak_kv-3.0.12. The upgrade was from 1.0.4 to 1.1.9 - https://github.com/basho/eleveldb/commit/6ef920211272a288316f0ffa21cb79350e63effc.
There should be LOG files within the leveldb backend folder for the problem partition (i.e. "/var/lib/riak/leveldb/11417981541647679048466287755595961091061972992\"), which are normally quite verbose - so it would be worth looking through those logs at the time of the issue.
Thx, it was a good hint. Previously id did a repair, beaus when vnodes stuck impossible to stop riak gracefully.
It is definitely something with the compression.
I'll keep digging.
2023/04/13-08:47:52.157605 7f7d3a7a7700 Level-0 table #25200: started
2023/04/13-08:47:52.157609 7f7d2ef90700 running 2...
2023/04/13-08:47:52.157735 7f7d2ef90700 waiting 2...
2023/04/13-08:47:52.359207 7f7d3a7a7700 Level-0 table #25200: 9598847 bytes, 17832 keys Corruption: corrupted compressed block contents
2023/04/13-08:47:52.359259 7f7d3a7a7700 Waiting after background imm compaction error: Corruption: corrupted compressed block contents
2023/04/13-08:47:52.359275 7f7d2ef90700 running 2...
2023/04/13-08:47:52.359325 7f7d2ef90700 waiting 2..
I went back to look at the basic upgrade test https://github.com/basho/riak_test/blob/develop-3.0/tests/verify_basic_upgrade.erl - going from 3.0.9 to 3.0.12. This passed with the eleveldb backend, but that is because the default compression algorithm is lz4. When I switched the test to force snappy compression:
2023-04-14 11:05:17 =ERROR REPORT====
** State machine <0.2253.0> terminating
** Last event in was timeout
** When State == started
** Data == {state,730750818665451459101842416358141509827966271488,riak_kv_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,undefined,0}
** Reason for termination ==
** {function_clause,[{riak_kv_vnode,terminate,[{bad_return_value,{stop,{error,"Corruption: corrupted compressed block contents"}}},undefined] ....
So it does look like there is a fairly clear issue of incompatibility, that we could have picked up in test.
I haven't been able to dig into snappy history in any meaningful way to see if there is an interim version that can safely bridge between.
I'll have a think about workarounds that may help. One way would be to do the upgrade via rolling transfer rather than rolling restart (so you data transfer into the upgraded nodes - rolling one node in/out at a time), but this is a lot more time consuming (and it may in that case be worth waiting for 3.0.16 to get the final improvements to transfers).
Also with regards to the configuration of lz4. Looking at the cuttlefish logic for compression in the eleveldb schema there is a translation operation:
https://github.com/basho/eleveldb/blob/develop/priv/eleveldb.schema#L174-L183
However, this translation operation isn't present in the eleveldb_multi.schema
:
https://github.com/basho/eleveldb/blob/develop/priv/eleveldb_multi.schema#L115-L129
This might mean that the compression_algorithm is ignored in multi-backends, as it doesn't get translated into the compression setting - and hence it gets defaulted to snappy.
Thanks. In our case, the rolling transfer can take weeks. :/ Is there a tweak where I can simply swap 2 nodes? I mean +1 node, -1 node, so I would expect all partitions from nodeA to move to nodeB.
Using https://docs.riak.com/riak/kv/latest/using/admin/commands/index.html#replace - is what I meant by rolling transfer. This allows you to setup a new node with the new code, and then using cluster replace to replace an existing node, then update that node, and replace another node etc.
Thx.
In RIAK 3.0.12, vnodes started to hang in leveldb:write/3 and the process mailbox could grow up to 10K messages. After a while whole node starts to behave unstable. The queue size of the vnode does not decrease, even without a load.
Another interesting thing that i noticed, that in the vnode state the compression is set to
snappy
, but in the config it is defined tolz4
.It's not fully confirmed, but t seems that in 3.0.9 doesn't have the issue. Are there any incompatibility between snappy versions?