dotpcap / sharppcap

Official repository - Fully managed, cross platform (Windows, Mac, Linux) .NET library for capturing packets
1.31k stars 267 forks source link

AccessViolation on SharpPcap.PacketCapture.GetPacket() #343

Closed KoenPlasmansLS closed 1 year ago

KoenPlasmansLS commented 2 years ago

Runtime system: Windows (client installation so not much known and unable to reproduce on developer machine) Installed software: SharpPcap 6.1.0 Application uses .NET Version 4.8

Coming back to issue #295, we upgraded our software to use version 6.1.0 of your library and we noticed that our service still crashes (even though we try to catch unmanaged exceptions...). Note that it is again very hard to reproduce for us. Any clue as to what could be going wrong?

Description: The process was terminated due to an unhandled exception. Exception Info: System.AccessViolationException at System.SpanHelpers.CopyTo[[System.Byte, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](Byte ByRef, Int32, Byte ByRef, Int32) at System.ReadOnlySpan1[[System.Byte, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TryCopyTo(System.Span1) at System.ReadOnlySpan`1[[System.Byte, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].ToArray() at SharpPcap.PacketCapture.GetPacket() at Lansweeper.PassiveScanning.PassiveScanner.(System.Object, SharpPcap.PacketCapture)

Seems like the bug originates from this line: https://github.com/chmorgan/sharppcap/blob/master/SharpPcap/RawCapture.cs#L96

Our code looks something like this:

private void InitializeCaptureDevice(LibPcapLiveDevice device)
{
    device.OnPacketArrival += Device_OnPacketArrival;
    device.Open(DeviceModes.Promiscuous);
    device.Filter = Filter;
    device.StartCapture();
}

private void Device_OnPacketArrival(object sender, PacketCapture e)
{
    try
    {
        var data = e.GetPacket());
    }
    catch (Exception ex)
    {
        //some code to handle exception here
    }
}
kayoub5 commented 2 years ago

This would happen if device.Close() or device.Dispose() was called while Device_OnPacketArrival is running

KoenPlasmansLS commented 2 years ago

I'm running tests that dispose (in a separate thread) the device on the arrival of a packet and all I get is the following error : DeviceNotReadyException with message "Cannot get datalink, the pcap device is not opened". Now it might be that it is very hard to reproduce as the disposal should be done just in between https://github.com/chmorgan/sharppcap/blob/master /SharpPcap/RawCapture.cs#L94 (which triggers the exception) & https://github.com/chmorgan/sharppcap/blob/master /SharpPcap/RawCapture.cs#L96 which is very hard to reproduce in tests. In our software, it is quite hard to get the closing of the device and the package handling in one thread. I'm just wondering if there is any other solution at hand. The main problem is, is that this generates an unmanaged exception which (for some obscure reason), we're not able to handle in our software (even with configuration or attributes like [HandleProcessCorruptedStateExceptions] & [SecurityCritical]. Hence, our application completely crashes at the customer site with only an event in the windows eventviewer.

kayoub5 commented 2 years ago

There is one solution I can think of:

        RawCapture SafeGetPacket(LibPcapLiveDevice device, PacketCapture e)
        {
            var handle = device.Handle;
            var gotRef = false;
            try
            {
                handle.DangerousAddRef(ref gotRef);
                if (!gotRef) { return null; }
                return e.GetPacket();
            }
            finally
            {
                if (gotRef)
                {
                    handle.DangerousRelease();
                }
            }
        }
KoenPlasmansLS commented 2 years ago

Hmm, I'm not sure what teh DangerousRelease and DangerousAddRef does but adding a "safe" method could work for us, yes!

KoenPlasmansLS commented 2 years ago

It seems I didn't add a part of the exception which explains a few things: System.AccessViolationException at System.Runtime.InteropServices.Marshal.CopyToManaged(IntPtr, System.Object, Int32, Int32) at SharpPcap.LibPcap.PcapDevice.MarshalRawPacket(IntPtr, IntPtr) at SharpPcap.LibPcap.PcapDevice.PacketHandler(IntPtr, IntPtr, IntPtr) at SharpPcap.LibPcap.Windows.pcap_dispatch(IntPtr, Int32, pcap_handler, IntPtr) at SharpPcap.LibPcap.PcapDevice.CaptureThread(System.Threading.CancellationToken) at SharpPcap.LibPcap.PcapDevice+<>c_DisplayClass79_0.b_0() at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at System.Threading.ThreadHelper.ThreadStart()

&

System.AccessViolationException at System.SpanHelpers.CopyTo[[System.Byte, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](Byte ByRef, Int32, Byte ByRef, Int32) at System.ReadOnlySpan1[[System.Byte, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TryCopyTo(System.Span1) at System.ReadOnlySpan1[[System.Byte, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].ToArray() at SharpPcap.PacketCapture.GetPacket() at Lansweeper.PassiveScanning.PassiveScanner.(System.Object, SharpPcap.PacketCapture) at SharpPcap.LibPcap.PcapDevice.SendPacketArrivalEvent(SharpPcap.LibPcap.PcapHeader, System.Span1) at SharpPcap.LibPcap.PcapDevice.PacketHandler(IntPtr, IntPtr, IntPtr) at SharpPcap.LibPcap.LibPcapSafeNativeMethods.pcap_dispatch(SharpPcap.LibPcap.PcapHandle, Int32, pcap_handler, IntPtr) at SharpPcap.LibPcap.PcapDevice.CaptureThread(System.Threading.CancellationToken) at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at System.Threading.ThreadHelper.ThreadStart()

The issue is NOT the AccessViolationException (though if it wasn't there, there wouldn't be a problem). The problem is that you spin up a new thread. Normally using a try/catch within the running of a thread will cause all exceptions to be handled. However, running a test program on your library, it seems I randomly get unhandled exceptions coming from this thread which crashes my test program. If you want, I can give you the test program so you can try it out yourself.

Here is where the thread is spun up. I'm thinking that maybe we can avoid this new thread and use tasks (which supports cancellationtokens as well) instead : https://github.com/chmorgan/sharppcap/blob/82167ba0345c0f5f32136db55760996cf0ae3ed4/SharpPcap/LibPcap/PcapDeviceCaptureLoop.cs#L58

KoenPlasmansLS commented 2 years ago

My suggestion:

    public partial class PcapDevice
    {
        /// <summary>
        /// Task that is performing the background packet capture
        /// </summary>
        protected Task captureTask;

        /// <summary>
        /// Flag that indicates that a capture task should stop
        /// </summary>
        protected CancellationTokenSource taskCancellationTokenSource = new CancellationTokenSource();

        /// <summary>
        /// Flag that indicates that a capture task should stop immediately
        /// </summary>
        protected CancellationTokenSource taskImmediateCancellationTokenSource = new CancellationTokenSource();

        /// <summary>
        /// Return a value indicating if the capturing process of this adapter is started
        /// </summary>
        public virtual bool Started
        {
            get { return (captureTask != null); }
        }

        /// <summary>
        /// Maximum time within which the capture task must await for completion (on 
        /// <see cref="StopCapture"/>) or else the task is cancelled completely.
        /// </summary>
        public TimeSpan StopCaptureTimeout { get; set; } = new TimeSpan(0, 0, 1);

        /// <summary>
        /// Starts the capturing process via a background task
        /// OnPacketArrival() will be called for each captured packet
        /// </summary>
        public virtual void StartCapture()
        {
            if (!Started)
            {
                if (!Opened)
                    throw new DeviceNotReadyException("Can't start capture, the pcap device is not opened.");

                if (OnPacketArrival == null)
                    throw new DeviceNotReadyException("No delegates assigned to OnPacketArrival, no where for captured packets to go.");

                var cancellationToken = taskCancellationTokenSource.Token;
                captureTask = new Task(() =>
                {
                    try
                    {
                        using (taskImmediateCancellationTokenSource.Token.Register(Thread.CurrentThread.Abort))
                        {
                            CaptureThread(cancellationToken);
                        }
                    }
                    catch
                    {
                        //at this moment, we swallow all exceptions so calling applications won't crash due to an internal error
                    }
                }, taskImmediateCancellationTokenSource.Token);
                captureTask.Start();
            }
        }

        /// <summary>
        /// Stops the capture process
        /// </summary>
        public virtual void StopCapture()
        {
            if (Started)
            {
                taskCancellationTokenSource.Cancel();
                taskCancellationTokenSource = new CancellationTokenSource();
                LibPcapSafeNativeMethods.pcap_breakloop(Handle);
                if (captureTask != null && !captureTask.Wait(StopCaptureTimeout))
                {
                    taskImmediateCancellationTokenSource.Cancel();
                }
                captureTask= null; // otherwise we will always return true from PcapDevice.Started
            }
        }
//... rest of file
kayoub5 commented 2 years ago

@KoenPlasmansLS all the stacks have PacketHandler in them, that could be used as safety point for doing DangerousAddRef See #344

KoenPlasmansLS commented 2 years ago

At our company, we did some extensive research on the underlying issue and we feel we must add more info to the original issue being reported here: The real problem we need to see addressed is NOT the AccessViolationException! Any exception that occurs within the thread being created in PcapDeviceCaptureLoop has the potential to be unhandled and can crash any client application that uses your library! Please note that we had a different AccessViolationException in the past that now seems solved but other issues may arise when solving this one as well.

Closing this issue will only solve one of the potential causes for the crashing of our application but there are numerous others currently in the code or there could be more in the future. If you want, I can create a separate issue to address the more global solution to crashes when using this library.

You told us in the past that the problem lies within our code for closing connections in a different thread than the one handling the arrival of packages but in this library, a thread is created for capturing packages over which we have little control.

Please look at the code I suggested. If anything is unclear, please let me know and I'll talk you through and explain the difference between a thread and a task!

I also have a test application that I used to simulate potential crashes we experience. Using 6.1, the test application crashed within 5 minutes. When using the code you see above, the test application ran for 2 hours without any crashes before we deemed it stable. If you want, I can give you that code/application as well.

kayoub5 commented 2 years ago

@KoenPlasmansLS I agree about having a generic try/catch to prevent an exception inside a thread from crashing the application. This is already covered in an unreleased commit in https://github.com/chmorgan/sharppcap/commit/82167ba0345c0f5f32136db55760996cf0ae3ed4#diff-8021e0b0ce794534e7b87c4ef192dc597320cfdc4a2a64499f368e333ca5a19fR69

Where ALL exceptions inside the thread are caught to prevent an application crash, a cleaner solution using a task is possible, but does not change the fact that a fix was already introduced.

For AccessViolationException, just catching it is not going to solve the issue, since the exception means that we are reading from invalid memory and that's a something we need to prevent and not just catch. That's what #344 is for.

KoenPlasmansLS commented 2 years ago

Hi @kayoub5 , I tried out the latest master build on the test-program I've written and I still get application crashes while when using a Task, I don't get these application crashes. So even the generic try/catch inside the thread is not bulletproof from what I see.

I added the code for our test application below (note that this program was written to simulate crashes and is by no means the code we use in production.

using SharpPcap;
using SharpPcap.LibPcap;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;

namespace Lansweeper.PassiveScanning.Tests
{
    public class ThreadSafeTest
    {
        private const string Filter = "somefilter"; //I changed it from our filter
        private static readonly Dictionary<string, LibPcapLiveDevice> _list = new();

        public static void Main(string[] args)
        {
            var captureDevices = LibPcapLiveDeviceList.Instance;
            var usableDevicesSorted = SortCaptureDevices(captureDevices);
            foreach (LibPcapLiveDevice device in usableDevicesSorted)
            {
                InitializeCaptureDevice(device, 0);
            }
            Console.ReadKey();
        }

        private static void InitializeCaptureDevice(LibPcapLiveDevice device, int nr)
        {
            if (nr == 5) return;

            try
            {
                if (!_list.ContainsKey(device.Name))
                    _list.Add(device.Name, device);

                device.Open(DeviceModes.Promiscuous);
                device.OnPacketArrival += Device_OnPacketArrival;
                device.Filter = Filter;
                device.StartCapture();
            }
            catch (Exception)
            {
                InitializeCaptureDevice(device, nr + 1);
            }
        }

        private static IEnumerable<LibPcapLiveDevice> SortCaptureDevices(LibPcapLiveDeviceList captureDevices)
        {
            IEnumerable<LibPcapLiveDevice> usableDevices = captureDevices.Where(device =>
                device.Interface != null &&
                device.Interface.Addresses.Count > 0 &&
                device.Interface.FriendlyName != null);

            return usableDevices
                .OrderBy(x => x.Loopback || x.Description.ToLowerInvariant().Contains("loopback"))
                .ThenBy(x => x.Interface.FriendlyName);
        }

        private static void Device_OnPacketArrival(object sender, PacketCapture e)
        {
            try
            {
                var device = (LibPcapLiveDevice)sender;
                if (device != null)
                {
                    var t = new Thread(new ParameterizedThreadStart((device) =>
                    {
                        try
                        {
                            ((ICaptureDevice)device).Dispose();
                            Thread.Sleep(10);
                            InitializeCaptureDevice(_list[((ICaptureDevice)device).Name], 0);
                        } catch(Exception ex)
                        {
                            Console.WriteLine($"Exception happened in thread:{ex}");
                        }

                    }));
                    t.Start(e.Device);
                    var data = e.GetPacket();
                }
            }
            catch (DeviceNotReadyException)
            {
                Console.WriteLine($"Exception ignored");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Exception happened:{ex}");
            }
        }
    }
}
kayoub5 commented 2 years ago

@KoenPlasmansLS you said that you made the tests on master, #344 was not merged yet at the time. I put your test in PR #345, to see if any of CI servers can reproduce the crash, since I was not able to do so

KoenPlasmansLS commented 2 years ago

@kayoub5 I definitely see an improvement. Just out of curiosity, when will a new release (candidate) be created? We're anxious to move ahead with this investigation at our customer's site.

kayoub5 commented 2 years ago

@kayoub5 I definitely see an improvement. Just out of curiosity, when will a new release (candidate) be created? We're anxious to move ahead with this investigation at our customer's site.

@chmorgan is the one who makes releases

KoenPlasmansLS commented 2 years ago

@kayoub5 I was able to investigate further and came to the following conclusion:

The culprit for us is this line: https://github.com/dotpcap/sharppcap/blob/fd4efd435dd50fb9ab35965c9193ddc2fcbcbf35/SharpPcap/RawCapture.cs#L96 which can cause an AccessViolationException which crashes our entire CLR for our program.

Changing the code to this (see below), keeps the code running but has possible side-effects. Any thoughts on this?

        public RawCapture(ICaptureDevice device, ICaptureHeader header, System.ReadOnlySpan<byte> data)
        {
            this.LinkLayerType = device.LinkType;
            this.Timeval = header.Timeval;
            GetData(data);
            this.PacketLength = Data?.Length ?? 0;
        }

        [HandleProcessCorruptedStateExceptions]
        private void GetData(System.ReadOnlySpan<byte> data)
        {
            try
            {
                this.Data = data.ToArray();
            }
            catch (Exception)
            {
                this.Data = new byte[0]; //not terrible, not great
            }
        }
KoenPlasmansLS commented 2 years ago

@kayoub5 : I just wanted to give you a heads up that the latest changes you made in https://github.com/dotpcap/sharppcap/pull/345/files fix the bugs in all our environment as it has been running for 2 weeks without anymore crashes! Big kudo's for helping us out!! Let us know if the changes are put in an official release so we can upgrade our system (which are now using a custom build).

kayoub5 commented 2 years ago

@kayoub5 : I just wanted to give you a heads up that the latest changes you made in https://github.com/dotpcap/sharppcap/pull/345/files fix the bugs in all our environment as it has been running for 2 weeks without anymore crashes! Big kudo's for helping us out!! Let us know if the changes are put in an official release so we can upgrade our system (which are now using a custom build).

Thanks for the feedback, I noticed in the CI that the crashes no longer take place in that PR with Windows, we still get crashes with Linux but I didn't get the time to debug them.

@chmorgan should we limit the scope of #345 to Windows and make a separate PR for Linux?

kayoub5 commented 1 year ago

The last bit of code needed to prevent crashes in Linux was implemented in https://github.com/dotpcap/sharppcap/pull/422 and tested in https://github.com/dotpcap/sharppcap/pull/423