Assert failing in WebSocketProvider.connect after repo.find

jessegrosjean commented 6 months ago

This reproduces the problem every time for me:

func testFind() async throws {
    let repo = Repo(sharePolicy: SharePolicy.agreeable)
    let websocket = WebSocketProvider(.init(reconnectOnError: false, loggingAt: .tracing))
    await repo.addNetworkAdapter(adapter: websocket)
    try await websocket.connect(to: URL(string: "wss://sync.automerge.org/")!)
    try await repo.find(id: .init())
}

The assert failure is in WebSocketProvider.connect at assert(peeredCopy == false).

heckj commented 6 months ago

Now I'm severly confused. I've got a PR that resolves the assertion failure, but as I'm testing it - with an equivalent bit of code for the pending PR (#85) that previously worked for me - now i'm very unclear on what the expected behavior is.

I'm feeling like I'm missing something here, big time. Not sure what the difference was, and I've tested against a local instance of Automerge-repo-sync-server as well as the current sync.automerge.org service.

@jessegrosjean look through the integration test I added in #93, and verify for me that it jives with what you're using - a variation of this bug with some explicit assertions embedded in it.

heckj commented 6 months ago

Actually, I have a theory about this. On the walk home, I remembered that in earlier work I filed an issue about getting first an "Unavailable" message from the remote server, followed by a sync request with content. I don't believe that's changed, but I may be calling the document "resolved" and handing back the result before I get the sync message coming in that would say "Oh wait, I've got something". I may need to re-think how that whole flow operates with that behavior, or push on that behavior if that's sending an erroneous message that it shouldn't.

I know that doesn't leave you in a great place with how to use this, and me just as confused - not to mention a pending question of "how long do we wait to know if it's really not found or it it is?"

jessegrosjean commented 6 months ago

@jessegrosjean look through the integration test I added in #93, and verify for me that it jives with what you're using - a variation of this bug with some explicit assertions embedded in it.

That integration test looks right to me. With that said I'm not 100% confident that what I'm doing with find really has the semantics that I want. It's a little unclear to me exactly how find is supposed to work. And hard to get into my head that my local repo and the web socket connected repo are peers... not client server.

This is also related to the forking discussion earlier. I will try to write up related thoughts/questions tomorrow.

I know that doesn't leave you in a great place with how to use this, and me just as confused - not to mention a pending question of "how long do we wait to know if it's really not found or it it is?"

I'm still just experimenting for fun when I should be working on other things, so don't worry about me :)

heckj commented 6 months ago

I'm digging through the tests that define repo behaviors at https://github.com/automerge/automerge-repo/blob/6ef25d8f300133bfefc71b44369cc222bcb43f0c/packages/automerge-repo/test/Repo.test.ts#L207

I'm not sure I'm reading it correctly, but how it reacts to a local JS repo appears to be different from what we're seeing with the behavior that creates a new document on find with a random DocumentId. These tests aren't 1:1 with our integration tests, as they're not interacting over the network, but that's seems to be the gist.

jessegrosjean commented 6 months ago

This is also related to the forking discussion earlier. I will try to write up related thoughts/questions tomorrow.

I've been posting more thoughts in #79. I think the summary related to this issue is that NSDocument probably doesn't want to be using full repo.find() implementation when loading in document data. In particular I don't think we want to wait for network timeouts with peers when loading data. Instead just want to check with local storage. Probably this is all best done with a new repo method that combines some of the existing create and some of the existing find behavior.

Of course still want find to generally work :)

heckj commented 6 months ago

I dug around through the Javascript code and explored more of the testing, and I'm more convinced now than ever that the "create a new document" result we saw earlier was incorrect, although I'm at a loss as to what I had in place that enabled that result.

The API of find() in the javascript side returns a DocHandle, notably different from what Swift returns, in that the JS DocHandle is an object that emits events there. The internal concept of find() is that the request is transitive, and may come back at a later time, so "unavailable" is a current state, but not a final state in that scenario. As such, the dochandle emits events for "unavailable" when a peer in that chain says "Sorry, no beuno", but remembers the request and if it's peers find something, it CAN return the document. So in the JS world, it's far more like an active subscription for a possible eventual document, rather than a fully resolved endpoint.

Based on that, and continuing to poke at the Integration tests, I thought the PR was worthy to merge - it fixed the assert issue you saw, cleaned up the logic a bit, and puts a stake in the ground as to expectations.

There's plenty of more questions about what/how automerge-repo does with it's transitive peer links, and how find can/should work - in particular, per https://github.com/automerge/automerge-repo/issues/343, I suspect the the JS WebSocker provider may be returning a temporary unavailable message before it's finished checking with any/all connected peers, but tracking that down it and verifying that for the JS side is writing a number of integration tests in JS to verify and provide a JS-local test scenario, so for the time being I'm leaving that alone.

automerge / automerge-repo-swift

Assert failing in WebSocketProvider.connect after repo.find #92