ais-dotnet / Ais.Net.Receiver

A simple .NET AIS Receiver for capturing the Norwegian Coastal Administration AIS network data (available under Norwegian license for public data (NLOD)) and persisting in Microsoft Azure Blob Storage. Sponsored by endjin.
Apache License 2.0
16 stars 12 forks source link

NetworkStreamNmeaReceiver not keeping up when parsing messages #160

Open pckv opened 1 year ago

pckv commented 1 year ago

We've been having issues where the network stream keeps stopping occasionally for a few seconds until it resumes. There are no logged error messages, and when it resumes the stream is as fast as before, so we're losing a lot of data. Eventually, after several minutes, it results in Corvus failing with the following error message:

System.ArgumentOutOfRangeException: The value needs to translate in milliseconds to -1 (signifying an infinite timeout), 0, or a positive integer less than or equal to the maximum allowed timer duration. (Parameter 'delay')
    at System.Threading.Tasks.Task.ValidateTimeout(TimeSpan timeout, ExceptionArgument argument)
    at System.Threading.Tasks.Task.Delay(TimeSpan delay)
    at Corvus.Retry.Async.SleepService.DefaultSleepService.SleepAsync(TimeSpan timeSpan)
    at Corvus.Retry.Retriable.RetryAsync(Func`1 asyncFunc, CancellationToken cancellationToken, IRetryStrategy strategy, IRetryPolicy policy, Boolean continueOnCapturedContext)
    at AisLog.Services.AisLogService.ExecuteAsync(CancellationToken cancellationToken) in /source/Services/AisLogService.cs:line 102
    at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)

This started happening a few weeks ago on our local development setups, and just recently it's also affecting our production server. CPU load is rarely high on either setup (<1%).

I've conducted some tests of running the receiver in an isolated environment, counting the number of messages received in a minute.

using Ais.Net.Receiver.Receiver;

var receiver = new NetworkStreamNmeaReceiver("153.44.253.27", 5631, TimeSpan.FromSeconds(0.5));
var receiverHost = new ReceiverHost(receiver);

var count = 0;

receiverHost.Sentences.Subscribe(_ => count++);
//receiverHost.Messages.Subscribe(_ => {});

await Task.WhenAny(receiverHost.StartAsync(), Task.Delay(TimeSpan.FromMinutes(1)));

Console.WriteLine(count);

Here are results of 5 runs from my laptop:

$ dotnet run -> 2473
$ dotnet run -> 2410
$ dotnet run -> 2402
$ dotnet run -> 2400
$ dotnet run -> 2401

Subscribing to ReceiverHost.Sentences seemingly works fine, as every test I've run has resulted in between 2400-2500 messages.

However, uncommenting receiverHost.Messages.Subscribe(_ => {}); thus also parsing the messages using a subscription, I've been getting mixed results on the sentence count. Another 5 runs of this new code on my laptop:

$ dotnet run -> 743
$ dotnet run -> 499
$ dotnet run -> 935
$ dotnet run -> 837
$ dotnet run -> 373

I've also tried only subscribing to messages and counting those which results in similar numbers. Running tests with something like receiverHost.Sentences.Subscribe(Console.WriteLine); I can also inspect that subscribing to Sentences results in a steady stream of messages, whereas when subscribing to Messages the stream pauses multiple times.

The reason I've been specifying laptop is that it seems to run fine (at the moment) on my desktop machine, which understandably is more powerful.

Laptop: Dell XPS 13 9310, Linux Ubuntu 22.04.1 LTS, Intel Core i7-1185G7, 16 GB RAM Desktop: Windows 10, AMD Ryzen 5600X, 32 GB RAM Production server: Virtual LXC Server, Linux Ubuntu 20.04 LTS, 2 vCPUs (unsure but might be Intel Core i7-3770), 2 GB RAM

What could be the cause of this? Consiering Ais.Net reportedly can handle more than a million sentences per minute, I'd assume this is a problem in the receiver.

HowardvanRooijen commented 1 year ago

Hi! Thanks for the details. What network stream source are you listening to?

pckv commented 1 year ago

Hello and thanks for the quick response! I'm listening to the same source as used in my test code, so 153.44.253.27:5631, which is the public Norwegian AIS stream.

HowardvanRooijen commented 1 year ago

That can definitely be "less than reliable" - I've seen issues with this endpoint via a number of different scenarios - the machine hosting the service getting rebooted (I suspect), the broadcast service crashing (i.e. the TCP endpoint is available but nothing responds) and various other glitches.

Adding some more robustness has been on my todo list for a while - I host the service on a Raspberry PI and it tends to fall over every couple of weeks, but root causing has proved a little elusive. I'll see if I can get some time with @idg10 today to see if he has any ideas.

pckv commented 1 year ago

I'll see if I can get some time with @idg10 today to see if he has any ideas.

That would be wonderful.

I've now also ran the same test, this time on my desktop which seemed to handle the public endpoint just fine, on the Norwegian Coastal Administration's closed source which provides more data.

These new results demonstrate the real issue with parsing the messages. When subscribing to Sentences, the count is very high and consistent:

$ dotnet run -> 15945
$ dotnet run -> 15957
$ dotnet run -> 16322
$ dotnet run -> 16082
$ dotnet run -> 16391

But when subscribing to Messages and thus parsing every sentence, the results are amazingly low:

$ dotnet run -> 2
$ dotnet run -> 45
$ dotnet run -> 12
$ dotnet run -> 11
$ dotnet run -> 69

This is way lower than our production setup, which seemingly receives a few thousand messages every second in between all of the pauses, which parses the messages and stores them in the database. This is most certainly a problem that occurs when parsing the messages, and not an actual problem with the endpoint. I'll attempt to run the first test (the one where it runs fine) while simultaneously performing some slightly expensive operations to see if that has the same effect.

pckv commented 1 year ago

@HowardvanRooijen I've managed to find the cause of the pausing. The stream crashes due to an uncaught NotImplementedException caused in https://github.com/ais-dotnet/Ais.Net.Receiver/blob/c8e29750256d4c58aed21e143d1f3d793f240857/Solutions/Ais.Net.Receiver/Ais/Net/Receiver/Receiver/ReceiverHost.cs#L65

The exception is raised due to OnError not being implemented in NmeaToAisMessageTypeProcessor, by the following line https://github.com/ais-dotnet/Ais.Net.Receiver/blob/c8e29750256d4c58aed21e143d1f3d793f240857/Solutions/Ais.Net.Receiver/Ais/Net/Receiver/Parser/NmeaToAisMessageTypeProcessor.cs#L70

Simply catching this exception solves our problem and the stream remains consistent.

try
{
    lineStreamProcessor.OnNext(new NmeaLineParser(lineAsAscii), 0);
}
catch (ArgumentException ex)
{
    if (errorSubject.HasObservers)
    {
        errorSubject.OnNext((ex, line));
    }
}
catch (NotImplementedException e)
{
    if (errorSubject.HasObservers)
    {
        errorSubject.OnNext((e, line));
    }
}

I still have no clue where this exception is raised or why it so rarely is. It seems to be raised by the processor and not the NmeaLineParser.

I'm not sure what the best way to go about resolving this would be, so I hope you have a better idea on what to do here. Otherwise, I could create a PR that simply ignores the error (which I suppose is not ideal).

We're currently using version 0.2.0-preview.5, and would very much appreciate a new build ASAP.

HowardvanRooijen commented 1 year ago

What's throwing the NotImplementedException? Are you able to capture the data that's causing that to be thrown?

pckv commented 1 year ago

Seems like there are some Class B vessel messages that are not being parsed properly. Here's a line that caused it to crash:

\\g:1-2-22691,s:2573115,c:1665565535*47\\!BSVDM,1,1,1,A,H3mhk`PEH400000000000000000,2*20

With the exception:

System.ArgumentException: Can only handle non-zero padding on the final message in a fragment

Since this also is an ArgumentException I've resolved the previous issue again by just doing

public void OnError(in ReadOnlySpan<byte> line, Exception error, int lineNumber)
{
    throw error;
}
HowardvanRooijen commented 1 year ago

Good troubleshooting! So it looks like there's some data being broadcast that doesn't adhere to the spec... it's one of the challenges of creating a data driven parser, these real-world use cases only happen in the real world and require sample data to fix... adding some error handling will definitely help... I should add some more robust handling / logging to the receiver app.

pckv commented 1 year ago

Preventing crashes upon reading a message that can't be parsed would at least be a good start for our case! We don't mind some messages being skipped. But at least the same issues arise in the public AIS stream we used first, so the issue should be reproducible.

HowardvanRooijen commented 1 year ago

There are a few different layers to this project - Ais.NET - that has proper test coverage in order to implement the spec, Ais.Net.Models and Ais.Net.Receiver are more demo/skunk work projects that we wrote to explore other use cases for Ais.Net and to create good demos, and to capture some free real data for analysis for user group talks... it's definitely not a production system. Ais.Net is the high performance layer - "Models" is the convenience layer that models the domain in a lower performance way but is easier for a human to grok.

HowardvanRooijen commented 1 year ago

We've just merged a PR (https://github.com/ais-dotnet/Ais.Net.Receiver/pull/155) into master which adds better error handling and you can configure the verbosity of log output too, including statistics:

25/10/2022 14:17:57: Sentences: 2564 | Messages: 2126 | Errors 66
25/10/2022 14:18:57: Sentences: 2596 | Messages: 2114 | Errors 80
25/10/2022 14:19:57: Sentences: 2536 | Messages: 2069 | Errors 68
25/10/2022 14:20:57: Sentences: 2579 | Messages: 2107 | Errors 73
25/10/2022 14:21:57: Sentences: 2686 | Messages: 2175 | Errors 80
25/10/2022 14:22:57: Sentences: 2625 | Messages: 2152 | Errors 71
25/10/2022 14:23:57: Sentences: 2565 | Messages: 2103 | Errors 88
25/10/2022 14:24:57: Sentences: 2557 | Messages: 2060 | Errors 89
25/10/2022 14:25:57: Sentences: 2618 | Messages: 2110 | Errors 86
25/10/2022 14:26:57: Sentences: 2637 | Messages: 2160 | Errors 82
HowardvanRooijen commented 1 year ago

Oh and on my Raspberry PI, it's consuming 1.2% of the CPU and 1.8% of the memory.

HowardvanRooijen commented 1 year ago

We've just published these fixes (and various other updates) as a 0.2 release. https://github.com/ais-dotnet/Ais.Net.Receiver/releases/tag/0.2.0

This release requires some updates to the configuration, you can find details on the readme: https://github.com/ais-dotnet/Ais.Net.Receiver/tree/0.2.0#configuration