Two identical scripts get two different set of messages

Zia- commented 11 months ago

Hello,

To experiment with the API, I ran the code mentioned here https://github.com/aisstream/issues/issues/35 for the entire globe on two different machines, connected to two different internet connections, using two different API-Key (thus impersonating two different users altogether). The two codes ran for an exact time-bin (mentioned here https://github.com/aisstream/issues/issues/37) of 25 mins. I expected them to receive the exact set of messages. However, one received 718431, while the other one 700294.

Kindly help me understand the behaviour. In order to make sure we don't miss any messages, does it mean I should run multiple identical scripts and make a merge of received messages on my end, handling duplicates etc?

aisstream commented 11 months ago

We place no guarantee that all messages are received in the same order for the same web-socket, namely because we write to web-sockets asynchronous. For example Message A may be scheduled to be sent before Message B, giving a different order.

I am skeptical of this test for a few reasons name these below:

How did you guarantee they were running for the exact same time? We send 400-800 messages a second so even killing the connection a few seconds apart will skew results.
How did you guarantee the tcp read buffer was fully flushed before closing the connections. Messages could of been sitting in the tcp buffer unread by your application that were sent by aisstream.io.
How did you guarantee that your application was consuming messages at the same rate ? A large problem we see at aisstream.io is clients not consuming messages quick enough causing the write buffers to their tcp connections to be blocked from tcp congestion control.

I will admit 18000 does seem like a large difference and there is the possibility of a bug in the code but a difference in the number of messages received over the same period by two different websockets is expected.

aisstream commented 11 months ago

Could you include your script so that we can review it ?

Zia- commented 11 months ago

import _thread
import time
import websocket
import json
from datetime import datetime

api_key = "<API Key>"
sleep = 30 * 60
types = ['ShipStaticData', 'StaticDataReport']
dump_size = 5000

message_time = datetime.now()
i = 0
messages = []

start_time = 1690189726
end_time = 1690191226
boolean1 = True
counter = 0
def on_message(ws, message):
    global messages
    global boolean1
    time_now = time.time()
    if time_now >= start_time:
        msg = json.loads(message)
        msg['time_now'] = time_now
        messages.append(msg)

    if time_now >= end_time:
        if boolean1:
            with open(f"ais.json", "w") as dump:
                json.dump(messages, dump)
            boolean1 = False

def on_error(ws, error):
    print("ERROR")

def on_close(w1, w2, w3):
    print('CLOSED CONNECTION')

def on_open(ws):
    def run(*args):
        subscribe_message = {"APIKey": api_key, "BoundingBoxes": [[[-180, -90], [180, 90]]]}
        ws.send(json.dumps(subscribe_message))
        time.sleep(sleep)
        ws.close()
    _thread.start_new_thread(run, ())

if __name__ == "__main__":
    ws = websocket.WebSocketApp("wss://stream.aisstream.io/v0/stream",
                                on_message = on_message,
                                on_error = on_error,
                                on_close = on_close)
    try:
        ws.on_open = on_open
        t0 = time.time()
        ws.run_forever()
        errored = ws.has_errored

    except KeyboardInterrupt:
        ws.close()
        print('Closed connection')

Above is the code I ran on two different machines with two different internet connections. The only thing I changed was the API Key (taken from two different accounts). Now, answering your doubts:

How did you guarantee they were running for the exact same time? We send 400-800 messages a second so even killing the connection a few seconds apart will skew results. : I'm not killing them manually. The _starttime and _endtime unix timestamp make sure they run for the exact 25mins time bin. And I set them in a way that the _starttime is 10mins from the current time so that by the time the current time gets equal to the _starttime, both the scripts were running (in order to avoid the lag caused by me manually hitting the enter key to start the two scripts).
How did you guarantee the tcp read buffer was fully flushed before closing the connections. Messages could of been sitting in the tcp buffer unread by your application that were sent by aisstream.io. : I'm not sure about this part. As can be seen in the code above, the moment current time gets equal to the _endtime, I stop appending the messages to the messages list (using the boolean1 var). So, the actual closing of the connection is happening well after the _endtime and is not really important as we have already stopped appending messages to the messages list. However, I'm not sure how to make sure the TCP buffer was fully flushed.
How did you guarantee that your application was consuming messages at the same rate ? A large problem we see at aisstream.io is clients not consuming messages quick enough causing the write buffers to their tcp connections to be blocked from tcp congestion control. : Well, as I'm only writing the messages list to a json file once the current time gets equal to the _endtime, I assume that there was no other I/O rate bottleneck that should have been taken care of. The idea was to append all messages to the messages list between the _starttime and _endtime without doing any other operation of post-processing or saving to a file which can raise concerns in terms of consumption rate.

I'm not concerned about the order of messages being received by the two scripts. However, I expected the number of them to be close, if not identical. When you say "a difference in the number of messages received over the same period by two different websockets is expected", is there a ballpark figure for this difference?

I really appreciate your help in looking into it.

aisstream commented 11 months ago

Thank you for the script. We will run the script ourselves at some point in the future.

The intention of questions 1 and 2 were more so to show that there are many external factors other than your application's code that would cause difference number of messages to be read over the same period. It would be very surprising if two websockets consume the same number of messages over the same period.

We cannot ballpark a difference in expected number of messages but assuming that your service is not disconnected they should converge to receiving the same set of messages.

Zia- commented 11 months ago

Thanks a lot. Make sense.

It seems like it all boils down to this issue https://github.com/aisstream/issues/issues/35. As long as I make sure the websocket doesn't get disconnected, I think I shouldn't worry too much about making these tests for very short time intervals (25 min in the above case).

aisstream / issues

Two identical scripts get two different set of messages #38