ceramicnetwork / js-ipfs-ceramic

Wraps js-ipfs instance with dag-jose codec enabled.
Other
3 stars 0 forks source link

Determine what is needed for pubsub messages to be received #15

Closed v-stickykeys closed 3 years ago

v-stickykeys commented 3 years ago

So far we are only seeing pubsub messages received on our infra's remote version of this node when we

1 - have fixed IPFS versions (as in https://github.com/ceramicnetwork/js-ipfs-ceramic/commit/59df199150894067d5dd27059a10669c3b41f867)

~2 - delete the config file created by ipfs~ (this isn't strictly necessary)

3 - restart the js-ceramic instance using this ipfs node

It's not exactly clear what the issue is but it seems to be some inconsistent state/configs between ipfs node and ipfs client in ceramic node

v-stickykeys commented 3 years ago

@zachferland Tried this again today and confirmed, we need steps 2 and 3 as mentioned above and it begins to work

v-stickykeys commented 3 years ago

Also noticed that when I made a change to infra pubsub stopped working.

In this case I changed the healthcheck endpoint for the ceramic albs and the ipfs nodes were not restarted but the ceramic instances were when the change was applied.

Then I had to restart the ceramic tasks again and it started working.

zachferland commented 3 years ago

hmm yeah we should definitely not need to delete config again (or everytime), and weird that there would be any issue making changes/updating ceramic alone. I suspect there is still other compounding issues. But the issue described in discord definitely holds.

Ceramic node only calls subscribe on start, if ipfs node restarts during that time, the ipfs node will no longer be subscribed to that topic and the ceramic node will no longer have a handler/connection to receive messages. Confirmed that behavior today (as expected). Even worse in ceramic node subscribe will not visibly fail or emit any events/errors if that connection is closed (ie ipfs down), it will just stop 'receiving messages'.

Looked into it more and unfortunately there doesnt seem to be any great solution. There is no available event or errors. We cant necessarily poll for topics even to see if subscribed still, because there can also be cases where the ipfs node is subscribed to the topic again (from another client), but the subscribe handler in the ceramic node is no longer receiving messages, and there is no easy way to determine that. I think likely an error handler/event or something should be added in the ipfs http api lib so that the client knows it should start attempting reconnects.

oed commented 3 years ago

Should this be raised to the js-ipfs team?

v-stickykeys commented 3 years ago

From testing today I was still able to receive pubsub messages in js-ceramic after stopping the ipfs node, IF i continuously sent messages to the topic it was subscribed to. When I stopped sending messages and then started again after a ~30s delay it stopped receiving them. Also from the node that I ran locally to send the messages it continued to list the infra node as a peer with it's peer id even after the infra node was down.

This seems to indicate

though I don't think this is a full picture of what is happening still.

v-stickykeys commented 3 years ago

@zachferland yeah it seems you are right actually. deleting config is not a necessary step. i think it was more coincidental. Tried today and restarting the instance got me reconnected--didn't have to remove s3 files.

I think this narrows down the problem to just IPFS restarting..

what we can do when this happens right now is something like

needsResub(topic) {
  let resub = false
  try {
    const topics = await ipfs.pubsub.ls()
    if (!topic in topics) resub = true
  } catch (error) {
    console.error('ipfs api connection failing')
    resub = true
  }
  return resub
}

and then call this before every pubsub.publish, this would still result in some messages not being received though until we actually try to send and then successfully resub

v-stickykeys commented 3 years ago

Pasting what wrote in discord... This is what I’m seeing now

  1. delete ipfs config files
  2. they get recreated
  3. connect via pubsub locally with new peer id--it works
  4. restart ceramic on infra--pubsub stops working locally and still not working on ceramic 🙃
v-stickykeys commented 3 years ago

Resolved