automerge / hypermerge

Build p2p collaborative applications without any server infrastructure in Node.js
MIT License
1.28k stars 66 forks source link

Limit number of open file descriptors. #72

Closed Gozala closed 1 year ago

Gozala commented 4 years ago

Goal of this changes is to address #69

This pull request adds RandomAcccessManager as replacement for random-access-file. It takes options for LRU and any random-access-store factory (e.g. random-access-file) and returns managed store factory. All the stores from that factor will comply with limits of LRU.

I have also merged test case from #54 which now seems to pass. However I observe some other test failure and too sleep now to investigate that further.

It probably also would make sense to break out RandomAcccessManager into own repo in random-access-storage org to ensure API compatibility.

When random-access-store is evicted by manager it's is closed, however attempt to read / write / ... to that store will reopen it and place it back to cache. However I am not fully confident that stores work properly if reopen, so there needs to be test to verify that.

Right now I have configured random access manager to max 500 stores with max idle time of 30 mins. Once max number limit is reached all the idle stores will get sweeped. However both options should probably be passed to the repo instead.

It would also make sense to handle case where open file limits might be less than provided max configuration to adjust limits accordingly. I might add such a change in the followup commit.

pvh commented 4 years ago

This looks good! I'd been working on something similar but I had trouble with callbacks racing when files were closed with changes in flight. I haven't run the tests yet but I suspect you might see something similar, particularly when the app tries to open thousands of cores all at once.

Gozala commented 4 years ago

Ah yes, you’re right concurrent ops would cause open requests. Maybe just wrapping the raf is not gonna do and I just need to fork it instead 😫

Gozala commented 4 years ago

This looks good! I'd been working on something similar but I had trouble with callbacks racing when files were closed with changes in flight. I haven't run the tests yet but I suspect you might see something similar, particularly when the app tries to open thousands of cores all at once.

Are those test written somewhere ? I would like to incorporate them.

I have also wrote a patch for random-acess-storage to add reopen flag which I find to be a better place to deal with the races that @pvh mentions otherwise wrapper needs to recreate a job queue very similar to the one that it already has in place.

Gozala commented 4 years ago

With this stuff in place I keep getting failures in Network.test.ts https://github.com/automerge/hypermerge/blob/af7e10402389441cf458d606cf3c138aa67006fa/tests/Network.test.ts#L30

ok 2316 netB finds netA
not ok 2317 plan != count
  ---
    operator: fail
    expected: 5
    actual:   6
    at: Queue.<anonymous> (/Users/gozala/Projects/hypermerge/tests/Network.test.ts:30:7)
    stack: |-
      Error: plan != count
          at Test.assert [as _assert] (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:225:54)
          at Test.bound [as _assert] (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:77:32)
          at Test.fail (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:318:10)
          at Test.bound [as fail] (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:77:32)
          at Test.assert [as _assert] (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:310:14)
          at Test.bound [as _assert] (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:77:32)
          at Test.equal (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:385:10)
          at Test.bound [as isEqual] (/Users/gozala/Projects/hypermerge/node_modules/tape/lib/test.js:77:32)
          at Queue.<anonymous> (/Users/gozala/Projects/hypermerge/tests/Network.test.ts:30:7)
          at Queue.<anonymous> (/Users/gozala/Projects/hypermerge/src/Network.ts:99:20)
  ...
ok 2318 netB records a discovery

/Users/gozala/Projects/hypermerge/tests/misc.ts:227
    if (!fn) throw new Error(`Invoked more times than expected. Invoked with: ${inspect(args)}`)
                   ^
Error: Invoked more times than expected. Invoked with: [
  NetworkPeer {
    onMsg: [Function],
    onConnectionClosed: [Function],
    isClosing: false,
    closedConnectionCount: 0,
    pendingConnections: Set {},
    connectionQ: Queue {
      queue: [],
      enqueue: [Function],
      name: 'NetworkPeer:connectionQ',
      log: [Function],
      push: [Function],
      subscription: [Function]
    },
    selfId: '8NLySgr4vURk4sL8KRDgYwgAMi9xu53M4TbqNNUAN5A6',
    id: '6ya1FgD9eEnwhstNDDQBUiaygJeoJCCurZrqx5G5YSks',
    busCache: WeakCache [WeakMap] { <items unknown>, create: [Function] },
    connection: PeerConnection {
      onMsg: [Function],
      type: 'tcp',
      isClient: true,
      heartbeat: [Heartbeat],
      rawSocket: [Socket],
      secureStream: [NoisePeer],
      multiplex: [Multiplex],
      internalBus: [MessageBus],
      id: '7b1229ce-a517-49b2-a541-fe2882bb7c63',
      onClose: [Function]
    }
  }
]
    at Queue.newFn (/Users/gozala/Projects/hypermerge/tests/misc.ts:227:20)
    at Queue.<anonymous> (/Users/gozala/Projects/hypermerge/src/Network.ts:99:20)
    at NetworkPeer.confirmConnection (/Users/gozala/Projects/hypermerge/src/NetworkPeer.ts:93:22)
    at NetworkPeer.pickNewConnection (/Users/gozala/Projects/hypermerge/src/NetworkPeer.ts:82:14)
    at NetworkPeer.onConnectionClosed (/Users/gozala/Projects/hypermerge/src/NetworkPeer.ts:125:12)
    at PeerConnection.conn.onClose (/Users/gozala/Projects/hypermerge/src/NetworkPeer.ts:66:31)
    at PeerConnection.close (/Users/gozala/Projects/hypermerge/src/PeerConnection.ts:84:17)
    at onTimeout (/Users/gozala/Projects/hypermerge/src/PeerConnection.ts:38:29)
    at Timeout.onTimeout (/Users/gozala/Projects/hypermerge/src/Heartbeat.ts:16:7)
    at Timeout._onTimeout (/Users/gozala/Projects/hypermerge/src/Heartbeat.ts:80:12)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

But seem to figure out why. If I remove stress test or significantly reduce number of files test no longer fails. Do people who know more details have any idea what this error might mean or how does it relate ?

Thanks

Gozala commented 4 years ago

This looks good! I'd been working on something similar but I had trouble with callbacks racing when files were closed with changes in flight. I haven't run the tests yet but I suspect you might see something similar, particularly when the app tries to open thousands of cores all at once.

I think the races you were observing were due to following bug https://github.com/random-access-storage/random-access-storage/issues/9

pvh commented 4 years ago

Well, that's exciting! I suppose we should test your patch with this fix?