Open shino opened 8 years ago
I think do_repair_chunks/10
should be used instead of do_repair_chunk
at 903e939. And it looks like right fix regardless of the exact issue (though it's a kind of type mismatch that must be detected by dialyzer).
type mismatch that must be detected by dialyzer).
Yes, I tried -Wspecdiffs
, dialyzer ouput many lines ;)
But {error, written}
indicates many subtle situation, which you could find in machi_file_proxy:handle_write()
so maybe we should take deeper insight on this. I look like a patch for this issue (which is being tested now).
For debugging purpose, we'd better have context information passed from cr_client to file_proxy that a write is whether in an append (writes are not supposed to overlap in midtail, and it warns that something is wrong except trim) or in a repair (it's like a force-overwrite).
diff --git a/src/machi_cr_client.erl b/src/machi_cr_client.erl
index 0d74f61..0728e11 100644
--- a/src/machi_cr_client.erl
+++ b/src/machi_cr_client.erl
@@ -439,8 +439,10 @@ do_append_midtail2([FLU|RestFLUs]=FLUs, Prefix, File, Offset, Chunk,
%% We know what the chunk ought to be, so jump to the
%% middle of read-repair.
Resume = {append, Offset, iolist_size(Chunk), File},
- do_repair_chunk(FLUs, Resume, Chunk, [], File, Offset,
- iolist_size(Chunk), Depth, STime, S);
+ Chunks = [{File, Offset, iolist_size(Chunk), machi_util:checksum_chunk(Chunk)}],
+ {Reply, S1} = do_repair_chunks(Chunks, RestFLUs, Resume, [FLU], File,
+ Depth, STime, S, {ok, Chunks}),
+ {reply, Reply, S1};
{error, trimmed} = Err ->
%% TODO: nothing can be done
{reply, Err, S};
Although we have to use checksum given by client here, not generating by ourselves.
What I want to confirm/settle down first is what {error, written}
means.
Current FLU server side implementation, it is used in response to append/write requests to indicate "already different bytes are written". CR client code seems to consider it as "write attempts to midtails are already dome with the bytes I want to write by someone else, e.g. repair". Is it better to use different error atom to indicate two cases?
For debugging purpose, we'd better have context information passed from cr_client to file_proxy that a write is whether in an append (writes are not supposed to overlap in midtail, and it warns that something is wrong except trim) or in a repair (it's like a force-overwrite).
:+1:
It may be the same for reverse direction: FLU -> client. For exapmle, {error, {bad_epoch, FLUEpoch, RequestedEpoch}}
.
Currently the file_proxy returns ok
when a write request indicates exactly same data that a flu already has - that said, {error, written}
indicates something is wrong (a requested bytes to write is different from what I have in disk), where we shouldn't trigger repair. If {error, trimmed}
is returned then a repair should be triggered.
Spin off from #33 .
Superficial reason of
bad_return_value
is reply tag as explained by quick fix [1]. But it seems like the root cause is double write as other two errors found.bad_return_value
was occurred in call fromdo_append_midtail2
todo_repair_chunk
inmachi_cr_client
. The current code triggers repair by{error, written}
[2] but FLU returns ok when "correct" chunk is written [3].The quick fix [1] will hide conflict in data and probably returns false ok.
[1] https://github.com/basho/machi/commit/903e9395c8a32af8ae7440ab8d0fb4d170c93cfd [2] https://github.com/basho/machi/blob/master/src/machi_cr_client.erl#L434 [3] https://github.com/basho/machi/blob/master/src/machi_file_proxy.erl#L683-L696