dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Why Dotnet call syslog(using DllImport("libc")) slower than Java(using Native.loadLibrary("c"))? #107393

Closed zhangmengyan21 closed 1 month ago

zhangmengyan21 commented 2 months ago

Description

In a Linux environment, when I use C# to call syslog to log locally, I find that compared to the results achieved with Java, C# is slower when the log length is small. C# uses DllImport("libc"), while Java uses Native.loadLibrary("c").

Configuration

Debian12(run on Podman) dotnet version:8.0.401 java version:1.8.0_221 libc version:2.36-9+deb12u7 rsyslogd:8.2302.0

Regression?

Not a regression

Data

After calling the open() method once, loop executing the syslog() method on a string of 100,000 fixed-length random contents, the following is the result: POPO-screenshot-20240905-180026 The horizontal axis is time consumed, and the vertical axis is the length of the string.

Code

part of c# code,Java code is basically the same.

    [DllImport("libc")]
    internal static extern void openlog(IntPtr ident, Option option, Facility facility);

    [DllImport("libc", ExactSpelling = true, CharSet = CharSet.Ansi, CallingConvention = CallingConvention.Cdecl)]
    internal static extern void syslog(int priority, string message);

    [DllImport("libc")]
    internal static extern void closelog();

    Stopwatch sw = new();
    sw.Start();
    openlog(tag, Option.NoDelay, Facility.LOG_LOCAL0);
    for (int i = 0; i < RUN_TIMES; i++) {
        syslog((int)Level.Info, lstMsg[i]); // lstMsg is a list of random strings of length X.
    }
    sw.Stop();
    Console.WriteLine($"{STRING_LENGTH}:" + sw.ElapsedMilliseconds);

Analysis

I have tried to change DllImport to LibraryImport and modify the parameters of each ImportAttribute, the result is always similar, I don't know if calling syslog in C# is really slow in some cases, or if there is a mistake in its usage. Using a fixed string yields similar results.

huoyaoyuan commented 2 months ago

part of c# code,Java code is basically the same.

Can you include more details about doing the syscall? They can be crucial to tell what's happening. Since a C# string is in UTF-16, there will always be a transcoding cost between UTF-8 and UTF-16.

The result is suspicious where strings of 2048 length is the fastest. I can't find any way to explain it. And also, 100000 2048 2byte = 0.4GB is large enough to cause issue for memory performance. You should use the same string instead.

jkotas commented 2 months ago

transcoding cost between UTF-8 and UTF-16.

Yes, the benchmark is testing the speed of marshalling UTF-16 strings to UTF-8 strings on interop boundary.

There are multiple strategies to do that. One possible strategy is to conservatively estimate the length of converted string, allocate the conservative amount of memory, and convert the string. Second possible strategy is to compute the exact length of the converted string, allocate the exact amount of memory and then convert it.

.NET default UTF8 marshallers use the first strategy for small strings, and the second strategy for larger strings. The second strategy is a bit slower, but it does not allocate extra memory unnecessarily. I suspect that Java uses the second strategy even for larger strings. It explains why you see .NET faster for small strings, but slower for larger strings.

If the performance of marshalling large strings is important for you and you do not care about the extra memory that it is going to use, you can write your own marshaller. You can start with the source code for the default marshaller at: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshalling/Utf8StringMarshaller.cs. You will also need to switch from DllImport to LibraryImport to use these modern interop marshallers.

lstMsg is a list of random strings of length X.

Do these random strings contain Unicode characters or are they ASCII only? The performance will vary a lot between ASCII only strings and strings with Unicode characters.

zhangmengyan21 commented 2 months ago

part of c# code,Java code is basically the same.

Can you include more details about doing the syscall? They can be crucial to tell what's happening. Since a C# string is in UTF-16, there will always be a transcoding cost between UTF-8 and UTF-16.

The result is suspicious where strings of 2048 length is the fastest. I can't find any way to explain it. And also, 100000 2048 2byte = 0.4GB is large enough to cause issue for memory performance. You should use the same string instead.

Yes, this is also my second question, why is a 2048-length log the fastest. I also tried using a string with a single character, such as a string with all 'a's, and the test results were still not significantly different. Just tried using '[LibraryImport("libc", StringMarshalling = StringMarshalling.Utf8)]', and the log output was not correct:

2024-09-06T02:05:19.511052+00:00 b9640171184f server_1: a

Only a single 'a' was output, while the code should output 32 'a's, trying to find out where the problem is.

And What details do you want to see? In fact, the code is very simple, just importing libc, using LibraryImport or DllImport, and calling syslog with strings of different lengths.

zhangmengyan21 commented 2 months ago

transcoding cost between UTF-8 and UTF-16.

Yes, the benchmark is testing the speed of marshalling UTF-16 strings to UTF-8 strings on interop boundary.

There are multiple strategies to do that. One possible strategy is to conservatively estimate the length of converted string, allocate the conservative amount of memory, and convert the string. Second possible strategy is to compute the exact length of the converted string, allocate the exact amount of memory and then convert it.

.NET default UTF8 marshallers use the first strategy for small strings, and the second strategy for larger strings. The second strategy is a bit slower, but it does not allocate extra memory unnecessarily. I suspect that Java uses the second strategy even for larger strings. It explains why you see .NET faster for small strings, but slower for larger strings.

If the performance of marshalling large strings is important for you and you do not care about the extra memory that it is going to use, you can write your own marshaller. You can start with the source code for the default marshaller at: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshalling/Utf8StringMarshaller.cs. You will also need to switch from DllImport to LibraryImport to use these modern interop marshallers.

lstMsg is a list of random strings of length X.

Do these random strings contain Unicode characters or are they ASCII only? The performance will vary a lot between ASCII only strings and strings with Unicode characters.

ASCII only. I also doubt whether there are different designs for the two languages in the Marshal operation, I will try to write my own marshaller. But the current test results show that c# is slower on smaller strings and faster on larger strings compared to Java.

zhangmengyan21 commented 2 months ago

The following is Java code:

interface Syslog extends Library {
    void openlog(Pointer iIdent, int iLogopt, int iFacility);

    void syslog(int iPriority, String strMessage);

    void closelog();
}

    public static void main(String[] args) {
        List<String> lstMsg = new ArrayList<>();
        // create lstMsg and insert some data
        Syslog syslog = (Syslog) Native.loadLibrary("c", Syslog.class);
        StopWatch sw = new StopWatch();
        sw.start();
        syslog.openlog(mp.get(1), Option.LOG_NDELAY, Facility.LOG_LOCAL0);
        for (int i = 0; i < RUN_TIMES; i++) {
            syslog.syslog(Priority.LOG_INFO, lstMsg.get(i));
        }
        sw.stop();
        System.out.println(STRING_LENGTH + ":" + sw.getTime(TimeUnit.MILLISECONDS));
    }
jkotas commented 2 months ago

the current test results show that c# is slower on smaller strings and faster on larger strings compared to Java.

Ah ok, I have misread your results. I thought that C# is slower for longer strings.

If you would like to get help to get to bottom of this, it would be best if you can publish a complete working repro in a scratch repo on github to make the results easily reproducible.

zhangmengyan21 commented 2 months ago

the current test results show that c# is slower on smaller strings and faster on larger strings compared to Java.

Ah ok, I have misread your results. I thought that C# is slower for longer strings.

If you would like to get help to get to bottom of this, it would be best if you can publish a complete working repro in a scratch repo on github to make the results easily reproducible.

https://github.com/zhangmengyan21/dotnet_syslog https://github.com/zhangmengyan21/java_syslog

jkotas commented 2 months ago

The test is producing very unstable numbers on my machine. I am not able to reproduce your results. For example, the numbers vary greatly depending on how long I wait before rerunning the test. The syslog services appear to have lazily flushed caches and the state of caches can influence the time to log a batch of new message greatly.

More than 98% of the time measured by your test is spent in the syslog call, outside .NET or Java runtimes. The variations that you have observed are likely caused by whatever else was going in your system and used the syslog service.

huoyaoyuan commented 1 month ago

More than 98% of the time measured by your test is spent in the syslog call, outside .NET or Java runtimes. The variations that you have observed are likely caused by whatever else was going in your system and used the syslog service.

Indeed, this is a common pitfall when benchmarking with external components.

A more reliable approach is to define a function with the same shape of the syscall, but does nothing:

// compile in C
void my_syslog(int, char*)
{
}

Then test performance with the dummy method.

zhangmengyan21 commented 1 month ago

The test is producing very unstable numbers on my machine. I am not able to reproduce your results. For example, the numbers vary greatly depending on how long I wait before rerunning the test. The syslog services appear to have lazily flushed caches and the state of caches can influence the time to log a batch of new message greatly.

More than 98% of the time measured by your test is spent in the syslog call, outside .NET or Java runtimes. The variations that you have observed are likely caused by whatever else was going in your system and used the syslog service.

yep,I tried to modify the CustomMarshaller code, replacing the return value directly with a fixed pointer. Anyway, the test used the same string, and the test results did not show a significant improvement in speed. I tested it on a relatively clean system without interference from other services. I am going to try an empty system call.

zhangmengyan21 commented 1 month ago
// compile in C
void my_syslog(int, char*)
{
}

Tested this scenario, from the perspective of C#, the time occupied is very small, compared to the previous syslog call, the time consumed can be almost negligible; however, using Java for the call is very slow, taking more than 10 times the time of C#.

Taking a string of length 512 as an example, running 100,000 times: java: my_syslog cost 106ms syslog cost 798ms c#: my_syslog cost 6ms syslog cost 986ms

I'm completely confused now T T.

huoyaoyuan commented 1 month ago

Tested this scenario, from the perspective of C#, the time occupied is very small, compared to the previous syslog call, the time consumed can be almost negligible; however, using Java for the call is very slow, taking more than 10 times the time of C#.

This is closer of the real result. Calling actual syslog function is dominated by external code, not the interop code you are using. For comparison, you can also write a C application over the actual syslog.

jkotas commented 1 month ago

syslog cost 798ms syslog cost 986ms

Does Java runtime call syslog service on its own? It is possible that Java runtime warms up the syslog service for you and your microbenchmark appears to run faster.

You may want to exclude the openlog call from the measurement and do the measurement multiple times. It should give you additional insights.

openlog(tag, Option.NoDelay, Facility.LOG_LOCAL0);
for (int i = 0; i < 5; i++)
{
    Stopwatch sw = new();
    sw.Start();
    for (int i = 0; i < RUN_TIMES; i++) {
        syslog((int)Level.Info, lstMsg[i]); // lstMsg is a list of random strings of length X.
    }
    sw.Stop();
    Console.WriteLine($"{STRING_LENGTH}:" + sw.ElapsedMilliseconds);
}
zhangmengyan21 commented 1 month ago

I found the problem by creating a lib library by myself, the code of the lib is as follows:

#include "empty.h"
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>

static int LogFile = -1;
static struct sockaddr_un addr;

void my_syslog(int i,char *msg)
{   
    char *buff;

    buff = (char *) malloc(8192);
    strcpy(buff, msg);  
    if (LogFile < 0)
    {    
        __open();
    }
    int res = send(LogFile, buff, strlen(msg), 0);
    if (res < 0) {
        printf("error send\n");
    }

    free(buff); 
}

static void __open() {
    addr.sun_family = AF_UNIX;
    (void)strncpy(addr.sun_path, "/dev/log", sizeof(addr.sun_path));
    LogFile = socket(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0);
    connect(LogFile, (struct sockaddr *)&addr, sizeof(addr));
}

The general function is to send log messages via AF_UNIX sockets; the rsyslog daemon will write the messages to log files. Then use Java and C# respectively to call this library again, the code here is not much different from before. The test results are as follows:

  | 32 | 128 | 512 | 1024 | 2048 | 4096 -- | -- | -- | -- | -- | -- | -- java | 135 | 146 | 206 | 233 | 300 | 433 c# | 186 | 221 | 256 | 312 | 443 | 540

If you create a socket server to write files by yourself or directly execute the fwrite method in the C code above, C# is still faster, I guess the problem lies with the rsyslog process, but I don't know the specific reason.

zhangmengyan21 commented 1 month ago

Add one more sentence, this piece of code is referencing the implementation of syslog in glibc. In theory, the previous situation where calling syslog in glibc was faster in Java than in C# should also be due to the same reason.