BitMEX / api-connectors

Libraries for connecting to the BitMEX API.
https://www.bitmex.com/app/restAPI
909 stars 798 forks source link

Nodejs streaming API, tracking number of rows returned in data #178

Open mariusk opened 6 years ago

mariusk commented 6 years ago

The streaming API example shows fetching the last entry of data from the last streaming event. Based on what I'm observing, the streaming API batches events, which can be observed by noticing that the length of data changes with more than one for callbacks, at least up until the data table is full.

It seems the number of new rows returned information is missing, but that is easy to track, at least until the data table fills up.

Then what happens? The docs says it uses FIFO, which the code seems to confirm (pasted in below).

The question then is, after the data table has filled up, how do we know how many items were actually added to the end of data from the last streaming event?

As far as I can see, we do not. It's easy enough to scan backwards in the table of course, to find the last known entry. However it would be just as easy to pass in the callback either the index of the first fresh row in the data table, or alternatively the number of items that got deleted. That way no scanning would be required.

Or am I missing something entirely?

  // For each subscription,
  toSubscribe.forEach(function(table) {
    // Create a subscription topic.
    const subscription = `${table}:*:${symbol}`;

    debug('Opening listener to %s.', subscription);

    // Add the listener for deltas before subscribing at BitMEX.
    // These events come from createSocket, which does minimal data parsing
    // to figure out what table and symbol the data is for.
    //
    // The emitter emits 'partial', 'update', 'insert', and 'delete' events, listen to them all.
    client.on(subscription, function(data) {
      const [table, action, symbol] = this.event.split(':');

      try {
        const newData = deltaParser.onAction(action, table, symbol, client, data);
        // Shift oldest elements out of the table (FIFO queue) to prevent unbounded memory growth
        if (newData.length > client._maxTableLen) {
          newData.splice(0, newData.length - client._maxTableLen);
        }
        callback(newData, symbol, table);
      } catch(e) {
        client.emit('error', e);
      }
    });
ryanfox commented 6 years ago

You are right in that the number of new items is not currently preserved. What purpose would you want that information for?

mariusk commented 6 years ago

@ryanfox When data is already full capacity, items are added to the end of the array. Then items are removed from the beginning of data to make sure the size of data does not go beyond _maxTableLen.

A consumer of the streaming API typically wants every event, not just the last event. If you want to argue that only the last event is relevant, then why do you bother appending all the new items to the array in the first place?

I've already implemented and tested that I am able to detect the number of items appended by scanning backwards until I recognize "last item seen" to figure out the number of new rows, but by returning the number of new rows returned the scan would not be necessary. And less people would be scratching their heads to come to the same conclusion (or worse, implement bugs due to unecessary complexity).

ryanfox commented 6 years ago

Why not just subscribe to the stream as in example.js? You can supply a callback that gets invoked every update.

mariusk commented 6 years ago

I had no idea subscribing to the instrument stream would give me everything. I've looked at the output from that stream, and it's kind of rich. That's fine really. But I am still struggling to figure out how to separate order book updates from trades and similar for that stream. Do you have any links describing the details of that stream, or anything showing how to extract order book updates and trades from the instrument stream? Thanks!

ryanfox commented 6 years ago

The list of available streams can be found in the websocket API documentation.

mariusk commented 6 years ago

The Python wsdump example from the websocket docs you linked to shows subscribing to both the trades and instrument streams. The example.js file you linked to only subscribes to the instrument stream. I've also captured and looked at some data from the instrument stream, and I am unable to find any trade data there.

So based on this, subscribing to the instrument stream does not give me "everything", even though it might give other useful information and it may very well give me one row per update as you claim.

Which brings me back to my original question, assuming that (also) the trade stream can return multiple rows, wouldn't it be useful to return the number of rows returned, to make it easy to keep track of new items in the data table?

Btw, I am successfully capturing data from both the trade and quote streams already, using the "search backward" methology I suggested, so it's not like this is impossible to do or even hard. I would just think just returning that additional number would make it a lot easier (and lead to a lot less buggier software using this API).

Arombalski commented 6 years ago

I've encountered the same issue when subscribing to the order stream. The responses seem to be getting batched and I'm losing order messages by only looking at the most recent element in data. Can't just loop through all of data because then I'll be incorrectly triggering the "reset" switch for my trading algorithm. I agree with mariusk that knowing the number of new elements passed would be much easier to handle.