apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.18k stars 1.29k forks source link

DR fails atomicity in 7.3.43 #11434

Open doublex opened 1 month ago

doublex commented 1 month ago

After updating FDB from 7.3.37 to 7.3.43 the disaster-recovery fails atomicity.

What we did:

fdbdr start --source /path/to/src.cluster --destination /path/to/dst.cluster
[wait until the DR is a complete copy of the primary database]
fdbdr abort --source /path/to/src.cluster --destination /path/to/dst.cluster

Older transactions are fine - but newer transactions are not atomic on the DR-clone. For example: The secondary-index is there without a primary record.

If I remember correctly this did not happen with 7.3.37. Is it safe to downgrade FDB to 7.3.37? If so, I could check.

Best wishes

doublex commented 1 month ago

fdbcli --exec status (primary database)

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - ssd-2
  Log engine             - ssd-2
  Encryption at-rest     - disabled
  Coordinators           - 1
  Desired Commit Proxies - 3
  Desired GRV Proxies    - 1
  Desired Resolvers      - 1
  Desired Logs           - 3
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 1
  Zones                  - 1
  Machines               - 1
  Memory availability    - 24.0 GB per process on machine with least available
  Fault Tolerance        - 0 machines
  Server time            - 05/30/24 11:11:51

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 134.925 GB
  Disk space used        - 168.328 GB

Operating space:
  Storage server         - 735.9 GB free on most full server
  Log server             - 735.9 GB free on most full server

Workload:
  Read rate              - 1156 Hz
  Write rate             - 116 Hz
  Transactions started   - 774 Hz
  Transactions committed - 41 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 1 as primary

Client time: 05/30/24 11:11:51
jzhou77 commented 1 month ago

Is it safe to downgrade FDB to 7.3.37? If so, I could check.

Yes.

doublex commented 1 month ago

@jzhou77 Thanks. v7.3.37 works perfectly. I will upgrade sender/recipient up to v7.3.43 to determine the regression. That's probably all I can contribute.

doublex commented 3 weeks ago

fdb_dr keeps atomicity if the primary-database ("sender") is not v7.3.43. v7.3.37 -> v7.3.37: works (transactions are atomic) v7.3.37 -> v7.3.43: works v7.3.41 -> v7.3.43: works v7.3.43 -> v7.3.43: secondary-indexes written, primary records missing (some)

Is there anything I can do to help?