apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.38k stars 1.3k forks source link

Go: getting a one-time error 1039 since removal of the R/O transactions commit #11621

Open gm42 opened 2 weeks ago

gm42 commented 2 weeks ago

Description of the problem

With current main, running this testcase with multi-version client disabled leads to:

no initial delay
initial empty R/O tx: got read version 56590938127280 (using multiversion client = false)
second empty R/O tx: got read version 56590938127280 (using multiversion client = false)
initial delay of 0.05 seconds
initial empty R/O tx: got read version 56590938127280 (using multiversion client = false)
second empty R/O tx: got read version 56590938127280 (using multiversion client = false)

With multi-version client enabled:

no initial delay
initial empty R/O tx, failed to get future: FoundationDB error code 1039 (The protocol version of the cluster has changed) (using multiversion client = true)
second empty R/O tx: got read version 56591215314792 (using multiversion client = true)
initial delay of 0.05 seconds
initial empty R/O tx: got read version 56591215314792 (using multiversion client = true)
second empty R/O tx: got read version 56591215314792 (using multiversion client = true)

Reverting the change introduced in #11366:

--- b/bindings/go/src/fdb/database.go
+++ a/bindings/go/src/fdb/database.go
@@ -235,8 +235,9 @@ func (d Database) ReadTransact(f func(ReadTransaction) (interface{}, error)) (in

                ret, e = f(tr)

-               // read-only transactions are not committed and will be destroyed automatically via GC,
-               // once all the futures go out of scope
+               if e == nil {
+                       e = tr.Commit().Get()
+               }

                return
        }

Instead we have no errors in both cases; this is an issue specific to the multi-version code, and I guess that the first commit waits for some state in the C/C++ binding to be initialized.

Could it be something related to the GRV cache?

Possible solution

Ideally when initializing network options the Go binding should wait for that initialization to complete, so that this initial 1039 error is avoided.

gm42 commented 2 weeks ago

FYI @johscheuer

gm42 commented 1 week ago

I have additionally found out that when using OpenWithConnectionString with a v6 FoundationDB cluster it causes this assert failure:

Assertion false failed @ /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/MultiVersionTransaction.actor.cpp 2430:
addr2line -e libfdb_c.so-debug -p -C -f -i 0x124f75c 0x88ec6d 0x8b528b 0x8b5144 0x1299ce8 0x92ae36 0x1074e07 0x8973e4 0x7733a9 0xffffe64c36b118b4

Corresponding to:

    Reference<IDatabase> newDb;
    try {
        newDb = connectionRecord.createDatabase(client->api);
    } catch (Error& e) {
        // Create error currently does not return any error except for network not initialized,
        // which cannot happen at this point
        ASSERT(false);
    }

The client is basically stuck in the initializing state in such case.

gm42 commented 1 week ago

I am proposing to add a method for client status: https://github.com/apple/foundationdb/pull/11627

So that clients can start using the database only after initialization is complete.