Closed tleyden closed 7 years ago
The functional tests are passing, looking into this test failure locally:
pytest -s -p no:sugar --mode=di --server-version=4.6.1 --sync-gateway-version=5e5fd9c835494586cf9d273138c1791bba843fb5 -k test_server_goes_down_rebuild_channels testsuites/syncgateway/functional/topology_specific_tests/
Mobile-testkit logs: https://gist.github.com/tleyden/31a233acf78ad60240058d6ffd09d920
From last jenkins runs, looks like there is only one failing functional test at this point.
On cc79e1c500c67b79779316b6828f44d61845d177, running this functional test locally:
pytest -s -p no:sugar --mode=di --server-version=5.0.0-2827
--sync-gateway-version=cc79e1c500c67b79779316b6828f44d61845d177
-k test_db_offline_tap_loss_sanity[bucket_online_offline/bucket_online_offline_default_dcp-100]
testsuites/syncgateway/functional/tests/
Fails w/ these logs:
https://gist.github.com/tleyden/3072f68c1ec35ded82ad47719dd4c5b7
The problem seems to be that after the https://github.com/couchbase/sync_gateway/commit/e796b55f7924531ce0de7355a69c336e53e3a349 change, the offline handling is not working in the above test scenario. It worked before https://github.com/couchbase/sync_gateway/commit/e796b55f7924531ce0de7355a69c336e53e3a349 because that was (erroneously) using the go-couchbase driver rather than the gocb driver when SG was acting as a channel index reader (feedtype=DCPSHARD).
After fixing that in https://github.com/couchbase/sync_gateway/commit/e796b55f7924531ce0de7355a69c336e53e3a349 there isn't any mechanism in place for Sync Gateway to look for the bucket going offline when using gocb in channel index reader mode.
There seems to be two main places where the bucket is detected to be offline:
The first one works for both gocb and go-couchbase, but only in channel cache mode
The second one only works for go-couchbase, and so it looks like we'll need to add similar functionality for the cases when using the gocb driver to connect to the couchbase bucket.
I don't think this is necessarily a problem. The motivation for #1 was to allow Sync Gateways running in channel cache mode to detect when they'd lost the mutation feed, and go offline (since otherwise the feed was failing silently, and didn't result in any errors). This was particularly important because the mutation feed didn't reliably restart when the bucket became available again (unlike KV operations).
When running in channel index mode, I don't think there's the same requirement. If the bucket connection is lost, individual requests will fail with error messages, and should recover on their own when the bucket connection is reestablished.
And just to restate - channel index readers shouldn't imply feedtype=DCPSHARD. It's more like feedtype=NONE, if anything.
And just to restate - channel index readers shouldn't imply feedtype=DCPSHARD. It's more like feedtype=NONE, if anything.
Yes, this threw me off. Here is the config file that mobile test-kit uses for the sync gateway node:
"databases":{
"db":{
"feed_type":"DCPSHARD",
"channel_index":{
"writer":false
}
}
}
That's a testkit issue. feed_type=DCPSHARD never needs to be specified in the config. Accel nodes are the only ones that use a sharded feed type, and they use that feed type by default (it doesn't need to be specified in the config).
The GoCB hybrid bucket was basically a crutch to scope limit the transition.
However, certain things that require type assertions get complicated, and it's probably better to drop the gocb hybrid bucket.
For example, when dealing with error handling, you might not know which bucket was actually used for the call to Couchbase Server, and so trying to figure out which kind of error it is gets complicated.
From last jenkins runs, looks like there is only one failing functional test at this point.
Failing functional tests
Base CC
Topology specific CC
Base DI
NOTE: test_continuous_changes_sanity[sync_gateway_default_functional_tests-10-10 failed in cen7-sync-gateway-functional-tests-base-di/496 but passed in a later rerun. More reruns in progress: rerun2 rerun3
Topology specific DI