HWCloudEngine / hybrid_cloud

6 stars 12 forks source link

[Storage Gateway] Replicate journals #117

Open yinweiishere opened 7 years ago

yinweiishere commented 7 years ago

This work is to enable our 'replication' feature, which includes:

  1. Research on transition protocols;
  2. Design replication dialog between local DR server and remote DR server, considering failure cases;
  3. Enable the parallelism of journal transition;
  4. Ensure correctness of journal consumption;
  5. Design the control path with replication/remote replay;
yinweiishere commented 7 years ago

Update status on behalf of Liangyu: plan:

  1. finish grpc performance test: multiple threads/multiple streams
toddyLee commented 7 years ago

status:

  1. grpc performance test: finished. test environment: 2 vm: 8U8G, network: 10GE, bw=900+MB/s; local disk io: 512KB sequence write/read through output~=600MB/s, 512KB random read through output 120MB/s use grpc sync streaming test: 2/5/10 threads/streamings, 100*64MB journals result: through output ~= 250MB/s brief analysis: since that source site need to read different journal files, it will disorder the io sequence in some degree, and lower the read performance; and when in test use nmon tool to monitor the io performance, it's read that the disk read is 100% busy, so read may be the bottleneck, and grpc work well.

plan of this week:

  1. write replication flow chart and classes diagram
toddyLee commented 7 years ago

status on 2016-10-11:

  1. replication flow chart (general progress) done; plan of this week:
  2. replication disable/enable (backup)design;
  3. replication calsses diagram and coding;
toddyLee commented 7 years ago

status on 2016-10-18:

  1. replication disable/enable (backup)design done;
  2. replication calsses diagram and coding: not done; plan of this week:
  3. since ceph libs3 sometimes errored with "ConnectionFailed", test with ceph librados to avoid the errors
  4. replication calsses diagram and coding;
toddyLee commented 7 years ago

status on 2016-10-25:

  1. test libradosGW /put/delete/list 100000 times, not reproduce the "ConnectionFailed" error, suspend and try to reproduce
  2. replication calsses diagram done, coding in progress; plan of this week:
  3. replication coding, finish the basic functions this week.