couchbase / couchbase-lite-net

A lightweight, document-oriented (NoSQL), syncable database engine for .NET
http://developer.couchbase.com/mobile/
Apache License 2.0
438 stars 126 forks source link

CB Lite 2.0 – Replication Error 400 Bad Request #908

Closed goersen closed 7 years ago

goersen commented 7 years ago

Hi,

for reference: I already posted about this issue in the couchbase forums: https://forums.couchbase.com/t/cb-lite-2-0-replication-error-400-bad-request/14207

Some Basic Info:

The Problem: Well, the most obvious problem is that the sync is not working. When creating a new document via .NET, it is not pushed into the bucket. Also I put a document into the bucket via iOS-App/Python-Backend and it is not pulled (db.GetDocument returns null). When I break in “onReplicationCompleted” the “replicationEvent” also contains a LastError: LastError = {LiteCore.LiteCoreException: LiteCoreException (400): Bad Request}

Also the console in which the sync-gateway runs doesn’t even show a connection-request. When I connect via browser (http://our.couchbase.server.ip:4985/travel-sample/_all_docs?include_docs=true), iOS-Demo or Python-Backend it shows the existing document correctly and in the console of the running sync-gateway I see the connection-requests.

What I’ve tried:

Code in the “MainPage.xaml.cs” of the .NET Standard Library:

    public MainPage()
    {

        this.db = new Database("sync-db");

        Uri url = new Uri("ws://our.cb.server.ip:4984/travel-sample");
        var config = new ReplicatorConfiguration(db, url);
        config.Continuous = true;
        config.ReplicatorType = ReplicatorType.PushAndPull;
        config.Authenticator = new BasicAuthenticator("demo", "password");
        config.Options.Channels = new List<String>();
        config.Options.Channels.Add("channel.demo");

        this.replication = new Replicator(config);
        this.replication.StatusChanged += this.onReplicationCompleted;
        this.replication.Start();

        InitializeComponent();
    }

    void onReplicationCompleted(object sender, ReplicationStatusChangedEventArgs replicationEvent)
    {
        if (replicationEvent.Status.Activity == ReplicatorActivityLevel.Stopped)
        {
            Console.WriteLine("Replication has completed.");
        }
    }

Sync Gateway Config: We are using the sync-config from the working iOS 2.0 sample: (source: https://github.com/couchbaselabs/mobile-travel-sample/blob/master/sync-gateway-config-travelsample.json)

{
      "interface":":4984",
      "adminInterface":":4985",
      "log": ["*"],
      "databases": {
        "travel-sample": {
          "import_docs": "continuous",
          "unsupported": {
            "enable_extended_attributes": true,
            "replicator_2":true
          },
          "bucket":"travel-sample",
          "server": "http://localhost:8091",
          "username": "***",
          "password": "***",
          "users":{
              "admin": {"password": "password", "admin_channels": ["*"]}, 
              "demo": {"password": "password"},
              "tester": {"password": "password"}
          },
        "sync": `
    function sync(doc, oldDoc) {

    /* Just ignore all the static travel-sample files */
      var type = getType();
      if (type == "landmark" || type == "hotel" || type == "airport" || type =="airline" || type == "route") {
             return;
      } 
      /* validation */
       var username = getUserName();
       // Verify the user making the request is the same as the one in doc's username
       requireUser(doc.username);

       if (!isDelete()) {
          // Validate required fields.
          validateNotEmpty("username", doc.username);

          if (isCreate()) {

            // Validate that the _id is prefixed by owner.
            var expectedDocId = "user" + "::" + doc.username;

            if (expectedDocId != doc._id) {
                throw({forbidden: "user doc Id must be of form user:userId"});

            }

          } else {
             // Validate that the username hasn't changed.
            validateReadOnly("username", doc.username, oldDoc.username);
          }

        }

      /* Routing */
      // Add doc to the user's channel.
      channel("channel." + username);

      // Give user read access to channel
       if (!isDelete()) {
        // Deletion of user document is essentially deletion of user
           access(username,"channel." + username)
       }

      function getType() {
        return (isDelete() ? oldDoc.type : doc.type);
      }

      function getUserName() {
        return (isDelete() ? oldDoc.username : doc.username);
      }

      function isCreate() {
        // Checking false for the Admin UI to work
        return ((oldDoc == false) || (oldDoc == null || oldDoc._deleted) && !isDelete());
      }

      function isUpdate() {
        return (!isCreate() && !isDelete());
      }

      function isDelete() {
        return (doc._deleted == true);
      }

      function validateNotEmpty(key, value) {
        if (!value) {
          throw({forbidden: key + " is not provided."});
        }
      }

      function validateReadOnly(name, value, oldValue) {
        if (value != oldValue) {
          throw({forbidden: name + " is read-only."});
        }
      }

    }
          `
        }
      }
    }
bdgza commented 7 years ago

We get the same error using .NET 4.7+.NET Standard 2.0 on Windows Forms.

borrrden commented 7 years ago

Could you also post the code for creating the documents?

goersen commented 7 years ago

This is from before we switched to the travel-sample so I'm not 100% sure it should work with the much more complicated travel-sample-sync-function (but the documents should still go to the server, right?):

    void TestClick(object sender, EventArgs e)
    {
        var newTask = new Document();
        newTask.Set("type", "task");
        newTask.Set("owner", "todo");
        newTask.Set("createdAt", DateTimeOffset.UtcNow);
        this.db.Save(newTask);

        var query = Query.Select()
        .From(DataSource.Database(this.db))
        .Where(
            Expression.Property("type").EqualTo("task")
        );

        var rows = query.Run();
        int resultCount = rows.Count;
    }

I also tried to start the replication again after triggering db.Save, but it again only gets 400 bad Error, no requests showing up in the running sync-gateway console and no new document on the server. I also figured that Continuous = True should keep the sync running.

The code to create the documents which is used in iOS-swift & Python are from the respective repositories:

iOS (https://github.com/couchbaselabs/mobile-travel-sample/blob/master/ios/TravelSample/TravelSample/Presenter/BookingPresenter.swift):

func addFlightBookings(_ flights:Bookings, handler:@escaping(_ error:Error?)->Void) {

    guard let db = dbMgr.db else {
        handler(TravelSampleError.DatabaseNotInitialized)
        return
    }

    guard  let docId = userDocId else {
        handler(TravelSampleError.UserNotFound)
        return
    }

    if var flightDocument = db.getDocument(docId) {

        _bookings = flightDocument.array(forKey: "flights")?.toArray() as? Bookings ?? []
        _bookings.append(contentsOf: flights)

        _bookings = _bookings.map({ (orig)  in
            var newBooking:Booking = orig
            newBooking["date"] = orig["date"]
            newBooking["destinationairport"] = orig["destinationairport"]
            newBooking["flight"] = orig["flight"]
            newBooking["price"] = orig["price"]
              newBooking["name"] = orig["name"]
            newBooking["sourceairport"] = orig["sourceairport"]
            return newBooking
        })
        flightDocument.setValue(_bookings, forKey: "flights")
        do {
            try db.save(flightDocument)
              handler(nil)
        }
        catch {
            handler(error)
            return
        }
    }
    else {
        handler(nil)
        return
    }

}

Python (https://github.com/couchbaselabs/try-cb-python/blob/5.0/travel.py):

elif request.method == 'POST':
    userdockey = make_user_key(username)

    token = jwt.encode({'user': username}, 'cbtravelsample')
    bearer = request.headers['Authorization'].split(" ")[1]

    if token != bearer:
        return abortmsg(401, 'Username does not match token username')

    newflights = request.get_json()['flights'][0]

    try:
        db.mutate_in(userdockey,
                     SD.array_append('flights',
                                     newflights, create_parents=True))
        resjson = {'data': {'added': newflights},
                   'context': 'Update document ' + userdockey}
        return make_response(jsonify(resjson))
    except NotFoundError:
        return abortmsg(500, "User does not exist")
    except CouchbaseDataError:
        abortmsg(409, "Couldn't update flights") 
borrrden commented 7 years ago

I tried initially to reproduce the error, but the sync function was rejecting arbitrary documents (403) because they weren't structured correctly. I have yet to reproduce the 400, but I'll try using the code you provided and if need be just remove the sync function. That being said, trivial syncs involves channel pulls are working so far for me in .NET Core and .NET 4.7 Console both.

goersen commented 7 years ago

Oh, and I don't know if this helps, but if I break in onReplicationCompleted after replicationEvent.Status.Activity == ReplicatorActivityLevel.Stopped the replicationEvent.Status.Progress.Total as well as replicationEvent.Status.Progress.Completed are both 0.

goersen commented 7 years ago

Hi borrrden, thanks!

Hm, this is weird... we only switched to the travel-sample from an originally much more trivial implementation because it worked in iOS, so I thought this would reduce possible errors on our side (like a faulty sync-config).

Do you have a trivial setup, which we could try?

goersen commented 7 years ago

Okay, I've made this much easier to reproduce:

Server Setup

  1. Create a Bucket "sync-test"

  2. Start the sync-gateway with the following config (replace username and password):

    {
    "interface": ":4984",
    "adminInterface": ":4985",
    "databases": {
        "sync-db": {
            "log": ["*"],
            "logFilePath": "/home/sync_gateway/log",
            "server": "http://localhost:8091",
            "bucket": "sync-test",
            "enable_shared_bucket_access": true,
            "enable_extended_attributes": true,
            "import_docs": "continuous",
            "username": "***",
            "password": "***",
            "sync": `function(doc) {channel("testchannel");}`,
            "users": {
                "GUEST": {"disabled": false, "admin_channels": ["*"]},
                "testuser": {"password": "foo", "admin_channels": ["*"]}
            },
            "unsupported": {
                "replicator_2": true
            }
        }
    }
    }
  3. Via the admin interface add a new document with the id "test123" and content {"foo": "bar"}. This generates a document like this:

    {
    "_id": "test123",
    "_rev": "1-cd809becc169215072fd567eebd8b8de",
    "_sync": {
    "rev": "1-cd809becc169215072fd567eebd8b8de",
    "sequence": 3,
    "history": {
      "revs": [
        "1-cd809becc169215072fd567eebd8b8de"
      ],
      "parents": [
        -1
      ],
      "channels": [
        [
          "testchannel"
        ]
      ]
    },
    "channels": {
      "testchannel": null
    },
    "cas": "",
    "time_saved": "0001-01-01T00:00:00Z"
    },
    "foo": "bar"
    }

 

Simple .NET Console-Project

  1. Create new Project
  2. Select Console App (.NET Framework) -- with .NET Framework 4.7.1
  3. Add the Couchbase db017 Nuget to the project
  4. Use the following sample-code (replace namespace and sync-gateway-ip):
    
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;

using Couchbase.Lite; using Couchbase.Lite.Sync;

namespace cb_test_3 { class Program { static void Main(string[] args) { Couchbase.Lite.Support.NetDesktop.Activate();

        Database db = new Couchbase.Lite.Database("my-local-db");

        Uri url = new Uri("ws://192.168.0.57:4984/sync-db");
        ReplicatorConfiguration config = new ReplicatorConfiguration(db, url);
        config.Continuous = true;
        config.ReplicatorType = ReplicatorType.PushAndPull;
        config.Authenticator = new BasicAuthenticator("testuser", "foo");
        config.Options.Channels = new List<String>();
        config.Options.Channels.Add("testchannel");

        Replicator replication = new Replicator(config);
        replication.StatusChanged += Program.onReplicationCompleted;
        replication.Start();

        Console.ReadLine();

        Document result = db.GetDocument("test123");
        System.Diagnostics.Debug.Print(result.GetString("foo"));

        Console.ReadLine();
    }

    static void onReplicationCompleted(object sender, ReplicationStatusChangedEventArgs replicationEvent)
    {
        if (replicationEvent.Status.Activity == ReplicatorActivityLevel.Stopped)
        {
            Console.WriteLine("Replication has completed.");
        }
    }
}

}

4. Run the program
5. Wait a bit when it waits due to `Console.ReadLine()` -- Look at the running sync-gateway to see if any connection gets logged
6. Press any key to continue the program

**Resulting Behaviour:**
1. Nothing happens in the console of the running sync-gateway
2. When the Replication-Status reaches "stopped", `replicationEvent` contains an Error `Bad Request (400)`
3. When you continue with any key, the document `result` is `null`.
4. Due to `result` being `null` the program obviously crashes.

**Additional Detail**
This is run in Visual Studio 2017 on Windows 10 with the Fall Creator Update

&nbsp;

## Equivalent Swift Example
1. Create a new iOS App
2. Add the CouchbaseLite Framework
3. Use the following code for `ViewController.swift` (replace `sync-gateway-ip`):

// ViewController.swift // cb_ios_test

import UIKit import CouchbaseLiteSwift

class ViewController: UIViewController {

fileprivate var _pushPullRepl:Replicator?
fileprivate var _database:Database?

override func viewDidLoad() {
    super.viewDidLoad()

    couchbaseTest()
}

override func didReceiveMemoryWarning() {
    super.didReceiveMemoryWarning()
}

func couchbaseTest() {
    do {
        _database = try Database(name: "my-local-db")
        let url = URL(string: "blip://192.168.0.57:4984/sync-db")!

        var replConfig = ReplicatorConfiguration(database: _database!, targetURL: url)
        replConfig.replicatorType = .pushAndPull
        replConfig.continuous = true
        replConfig.authenticator = BasicAuthenticator(username: "testuser", password: "foo")
        let userChannel = "testchannel"
        replConfig.channels = [userChannel]

        _pushPullRepl = Replicator.init(config: replConfig)
        _pushPullRepl?.start()

    } catch let error as NSError {
        NSLog("Cannot open the database: %@", error);
    }
}

@IBAction func onButtonClick(_ sender: Any) {
    let result = _database?.getDocument("test123")
    NSLog((result?.string(forKey: "foo"))!)
}

}


4. Add a Button and bind it to the `onButtonClick` function

**Behaviour:**
1. After the app starts, a connection-request and sync is visible in the running sync-gateway console
2. After waiting a bit until sync is finished and then clicking the button, the `result`-object in the `onButtonClick` function contains the correct document
3. After waiting a bit until sync is finished and then clicking the button, `"bar"` is logged in the Xcode-terminal (which is the expected behaviour)
borrrden commented 7 years ago

I'm sorry to say that your program works without problem for me. The document is pulled and "bar" is logged to the debug output. I am also running Visual Studio 2017, but not the Fall Creator Update (which seems to have just recently been released). I am on 15063.608.

What version of Sync Gateway are you running (you said 1.5, but what is the build number that is reported in the log. Beta 2 should be 494, Beta 1 is 377). You can find this on the first line of the log. Mine says

2017-09-23T09:43:25.572+09:00 ==== Couchbase Sync Gateway/1.5.0(494;5b6b5b9) ====

I am also using an internal build of Couchbase Server 5.0, but I doubt that the problem lies there...if all else fails though then I will grab the same build number as you (Let me know what it is, you can find it on the top right of the dashboard page of Couchbase Server).

goersen commented 7 years ago

:( This is kinda disappointing :(

Yeah, seems like we also use the beta 2 build: 2017-09-23T15:16:14.960+02:00 ==== Couchbase Sync Gateway/1.5.0(494;5b6b5b9) ====

We now tested it on 3 different Windows 10 PCs and with different .Net Framework Versions (4.6.2 - 4.7.1):

  1. Windows Fall Creator Update Version 1703 (Build 16281.1000) -- Insider Build 16281.rs3_release
  2. Normal "most up-to-date" Windows without Insider -- Version 1703 (Build 15063.608)
  3. An older Windows Version -- Version 1607 (Build 14393.1593)

All have the same result :(. I also really doubt that it would have something to do with the CB-Server, because the websocket connection just gets closed unexpected or unclean and doesn't even show up in the sync-gateway-console. Also the sync-gateway <--> CB-server connection seesms to work, since the Swift-App does work. But of course, as always with software, you never know ;).

I also disabled the Firewall etc. to exclude any network problems. It's also not a "general" problem, since it works without problems in the Switft-App.

I realized, I didn't paste the log from the .NET Console Program. It also shows the Error:

15:07:55.929128| [BLIP]: {1}--> ->ws:192.168.0.57:4984/sync-db/_blipsync
15:07:55.929814| [BLIP]: {1} Opening connection...
15:07:55.930173| [Actor]: Starting Scheduler<07141C08> with 8 threads
15:07:55.931808| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.932228| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:55.932645| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.933084| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.933599| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.933998| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.934368| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.934749| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.935132| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:55.935501| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:56.035730| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.036597| [WS] WARNING: {2}--> ws:192.168.0.57:4984/sync-db/_blipsync
15:07:56.037011| [WS] WARNING: {2} Unexpected or unclean socket disconnect! (reason=WebSocket status, code=400)
15:07:56.037479| [WS]: {2} sent 0 bytes, rcvd 0, in 0.000 sec (-nan(ind)/sec, -nan(ind)/sec)
15:07:56.037813| [Actor]: BLIP[->ws:192.168.0.57:4984/sync-db/_blipsync] performNextMessage
15:07:56.038335| [BLIP]: {1} Closed with WebSocket status 400: Bad Request
15:07:56.038738| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.039126| [Sync]: {3}--> Repl->ws:192.168.0.57:4984/sync-db/_blipsync
15:07:56.039482| [Sync]: {3} Connection closed with WebSocket status 400: "Bad Request"
15:07:56.039851| [Sync] ERROR: {3} Got LiteCore error: Bad Request (7/400)
15:07:56.040288| [Sync]: {3} now stopped
15:07:56.040670| [Actor]: DB->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.041102| [Sync]: {4}--> DB->ws:192.168.0.57:4984/sync-db/_blipsync
15:07:56.041441| [Sync]: {4} now idle
15:07:56.041792| [Actor]: Push->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.042185| [Actor]: Pull->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.042525| [Sync]: {5}--> Push->ws:192.168.0.57:4984/sync-db/_blipsync
15:07:56.042876| [Sync]: {5} now idle
15:07:56.045249| ~BLIPIO: Max outbox depth was 0, avg -nan(ind)
15:07:56.045717| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.046217| [Sync]: {6}--> Pull->ws:192.168.0.57:4984/sync-db/_blipsync
15:07:56.046718| [Sync]: {6} now busy
15:07:56.047217| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.047743| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
15:07:56.132094| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage
Replication has Stopped.
15:08:09.320691| [Actor]: Repl->ws:192.168.0.57:4984/sync-db/_blipsync performNextMessage

I'm kinda out of ideas here :(. Anyone else has any idea what we could do/try to either get it working or helping to fix it?

[EDIT]: Our CB-Server Version is: Enterprise Edition 5.0.0 build 3217

goersen commented 7 years ago

Some more info:

The websocket definitely never gets opened correctly. I monitored the connections via Charles Web Proxy and nothing outgoing to the sync-gateway shows up, when running the console program.

When I monitor the connections from the swift-app, the web-socket communication get's logged as expected. Even when I open a Javascript-Websocket to the wrong URI (i.e. "ws:192.168.0.57:4984/sync-db" (without "_blipsync")), the sync-gateway doesn't show a connection-request, but it at least gets logged in charles (because the initial connection get's established, the server just doesn't answer with 101).

borrrden commented 7 years ago

By the way I’ve been using the latest master to test. In the meantime db017 was released. Does this also happen with db017?

goersen commented 7 years ago

Yeah, all tests with the console-program were done with db017 (it was released on Nuget on thursday, I believe). The setups from before that (initial report and with the travel-sample) was with db016.

Just to make sure I just uninstalled, cleaned and reinstalled the nuget, but with the same result.

bdgza commented 7 years ago

I updated NuGet to db017 in my application and get the same (error 400) result.

borrrden commented 7 years ago

This is the worst kind of situation for me. Can either of you put up a sync gateway endpoint that I can try to connect to to see if it happens when I connect to it?

Word or warning: I’ve been put off of 2.0 and onto 1.4.1 for the next two weeks (CBL .NET is a one person show) so my time on this will be limited :(.

goersen commented 7 years ago

I'll put up a sync-gateway on one of our servers.

Yeah, I know the feeling :( ... thank you so much for putting in the effort!

In the meantime: Could you give me some hints on how to set up an environment in which I could debug this more? I guess checking out and using the branch feature/api-v2? What about the lite-core library? (There is a missing download-link in the readme.md.) Would your guess that this originates in lite-core or rather lite-net?

I'm more than happy to delve in and see what triggers this. But I'm affraid I need some pointers on how to set the environment up.

borrrden commented 7 years ago

I am happy to give help on building as well, as it would be the easiest way to try to debug. You are correct that feature/api-v2 is what you need. This will get you the .NET code but you also need to build LiteCore. This is done via CMake and I wrote a blog post about it. It says "for .NET Core" but the same process is used for .NET Framework as there is no distinction at this level.

We don't have any public download links for LiteCore. I don't think the problem lies there (the same exact code is reused for the other platforms including the one that works for you). That being said, you can probably just copy the LiteCore.dll out of the nuget package and put it into the correct location (noted in the blog post) to start out debugging. It will be in the runtime/win7-x64 and runtime/win7-x86 folders.

WebSocketWrapper.cs / WebSocketTransport.cs should be of particular interest. Those two classes are what govern the actual moving of bits to and from web sockets, while the actual message construction takes place in LiteCore. You'll notice they basically just take arrays of bytes and either marshal them to LiteCore, or receive them from LiteCore and send them through the web socket instance.

borrrden commented 7 years ago

P.S. Thanks for being willing to debug (it's rare for me to have someone willing to do it)

goersen commented 7 years ago

Sure thing, no problem at all. I'll see when I can get to this. In the meantime we tried some older releases and FYI: db014 works perfectly.

bdgza commented 7 years ago

@goersen did you also downgrade the Gateway, or did you try db014 with Gateway v477?

goersen commented 7 years ago

@bdgza Same server-setup and same console-program. Just uninstall Nuget, clean and install db014. I think db015 also works for the console program. But if I remember correctly, that version had some Issues with our Xamarin setup.

goersen commented 7 years ago

Ok, I've now succesfully setup the example with the feature/api-v2 branch.

The good:: Nice project setup! Using the feature-branch was quite easy and nicely streamlined. Only VS wanted to inflict a tiny bit of pain, but it was quickly resolved. And the tip with using dlls from the Nuget for core-lite was a great time- (and probably nerve-) saver!

The weird: It works. No problem whatsoever. The Document gets pulled and can be accessed from the local DB. The only "problem" I would say there is, is that replicationEvent.Status.Activity == ReplicatorActivityLevel.Stopped never happens. It's just Connecting --> Busy --> nothing. Progress goes 0/0 --> 0/1 --> 1/1 --> 1/13 --> 13/13 (with only 1 document) and Idle only happens when there are no documents to sync (i.e. starting the program for the second time). So there is currently no real way to know when the replication is finished (or I'm missing it, which is totally possible ;)). But this is the same behaviour as in db014, I believe.

The only other difference I noticed is that the API for replication-config is a bit different from db016/db017: config.Options.Channels has to be replaced by config.Channels and "Options" is now private (good change by the way). I don't know when this was changed, so I cannot say if this gives any hints as to why the Nuget doesn't work, while the feature-branch does. The only other thing I could think of would be that an older version of lite-core was used? I really have no idea... :(

Anyway... for now, I have done some more "fun" fiddling with more dlls and built a Nuget from the branch which we'll use as a local Nuget.

When I can do anything further to investigate this, let me know any time :).

Best regards! Lars

borrrden commented 7 years ago

That is quite odd. A word of warning about the current master: The automated tests are currently failing (one test in the suite).

As far as the idle thing, that's the way it is designed (I think you set Continuous to be true?). Continuous replications will never stop running until you actually stop them. There is no way to tell when they are "done" because they are never "done". They will receive / send things in real time as needed.

The API change was something I did to align with the Swift and Java variants (they don't have an "options" dictionary and I only have it because I set up a construct that is easy for me to convert between dictionary and class for LiteCore).

I am not sure what could have changed in the meantime to make things work again...but if db018 comes out and it doesn't work again I am going to start wondering if it will only work on the machine that it is compiled on for some odd environment reason.

goersen commented 7 years ago

Sorry, you are of course right, about the Continues = true replication.

FYI: The Nuget I created also runs on other PCs which have the same problem with db016/db017. As mentioned before, db014 also works fine.

I'll test db018 again, when it releases and hopefully everything will be just fine :).

borrrden commented 7 years ago

I put out DB018 just now, FYI.

eporsche commented 7 years ago

Well it looks like I have the same problem here - db018 is not syncing with: [WS] WARNING: {2} Unexpected or unclean socket disconnect! (reason=WebSocket status, code=400) whereas db14 syncs as expected.

bdgza commented 7 years ago

db018 also does not fix the issue for me.

borrrden commented 7 years ago

Does building from the db018 commit as you did before work?

EDIT If it does, it means that it is probably not a code change but a build server change that is causing this issue. I'm not sure what could have changed though.

borrrden commented 7 years ago

I finally have a setup that reproduces this! I will be able to diagnose this further now.

borrrden commented 7 years ago

Alright I found the issue, and it is dependent on environment but in an interesting way. Luckily I can prevent it. When I generate the commit hash to embed inside of Couchbase.Lite.dll, it sometimes ends in a newline. This commit is embedded into the User-Agent header of the initial HTTP request for syncing, and as a result of the extra newline present at the end of the header string, a double new line (aka "end of HTTP message") gets written. Since this gets written before any of the web socket header information sync gateway rejects it as invalid. The fix is to trim the end of the header strings before writing them.

goersen commented 7 years ago

Wanted to chip in on this again, but I see you already fixed it :)

Awesome! Thanks Borrrden! Looking forward to be working with official CB-Releases again :)