Tencent / xLua

xLua is a lua programming solution for C# ( Unity, .Net, Mono) , it supports android, ios, windows, linux, osx, etc.
Other
9.32k stars 2.45k forks source link

[建议]为安卓平台il2cpp模式提供__Internal的绑定方式,以减少PInvoke引入的跨语言调用消耗 #1041

Open littlesome opened 1 year ago

littlesome commented 1 year ago

看到隔壁puerts的性能测试表现, https://github.com/Tencent/puerts/blob/master/doc/unity/zhcn/il2cpp/performance.md

是否考虑在安卓平台+il2cpp模式下通过提供静态xlua.a库,使其可以link到il2cpp.so中以减少PInvoke引入的跨语言调用的消耗?

大概的改改动:

  1. 提供安卓平台编译xlua.a静态库的脚本
  2. 开启某个宏(比如:XLUA_FORCE_INTERNAL_PINVOKE)的情况下,以下PInvoke使用__Internal方式
    public partial class Lua
    {
#if (UNITY_IPHONE || UNITY_TVOS || UNITY_WEBGL || UNITY_SWITCH) && !UNITY_EDITOR
        const string LUADLL = "__Internal";
#else
        const string LUADLL = "xlua";
#endif
chexiongsheng commented 1 year ago

动态链接主要是有个符号查找,但这只有一次,后续动态库里的函数调用和静态库的函数也差不多,都是参数压栈后地址跳转。

如果动态链接和静态库有那么大差距,世界上就没有动态库存在的必要了,或者网上会有铺天盖地的性能避坑指南告诉你别用动态库。

而puerts也加了纯脚本执行的测试,以for循环调用纯脚本函数为例,android下v8关闭jit后,v8是lua的5倍性能,而同样的用例,在ios下v8是lua的一半。这是完全没涉及pinvoke的。

chexiongsheng commented 1 year ago

v8在ios和android的表现差距,初步估计要么是v8在android下优化得更好,要么是编译选项的不同所致。

chexiongsheng commented 1 year ago

"android下v8关闭jit后,v8是lua的5倍性能"这个测试是我同事做的,他用例是2w循环,有可能误差比较大 我刚在android下测试了5000w循环,v8是lua的1.9倍 puerts for using 5333ms xlua for using 10131.795ms

chexiongsheng commented 1 year ago

纯脚本,window下v8关jit也是比xlua要慢一点 puerts for using 1043ms xlua for using 856.0ms

感觉可能是仅android特别快

chexiongsheng commented 1 year ago

定位是android下xlua的编译问题,升级ndk后lua性能可以提升不少,android下在华为麒麟cpu测试for循环比jitless v8快,高通cpu则比jitless v8略慢一点,但差不多(此前高通下测试比jitless v8慢一倍)

littlesome commented 1 year ago

收到,

我们这边测试下来 libxlua.so vs xlua.a (link to libil2cpp.so) 性能差异不大,静态链接到libil2cpp.so中反而会略慢一点

但是发现il2cpp在arm64下(C++ Compiler Configuration选Master)居然也没有开启lto,需要进一步测试强制开启后的效果

littlesome commented 1 year ago

测试结果:

94EQPIFVL}D~(`R@WZ P$7L

发现静态编译+开启lto之后性能提升较大(最后2个下降了原因待查),测试环境:

开启lto相关改动:

  1. xLua CMakeLists.txt
 build/CMakeLists.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/build/CMakeLists.txt b/build/CMakeLists.txt
index edf51f5..874b624 100644
--- a/build/CMakeLists.txt
+++ b/build/CMakeLists.txt
@@ -189,13 +189,15 @@ elseif ("${CMAKE_SYSTEM_NAME}" STREQUAL "Switch")
     )
     target_compile_options(xlua PRIVATE -m64 -mcpu=cortex-a57+fp+simd+crypto+crc -fno-common -fno-short-enums -ffunction-sections -fdata-sections -fPIC -fms-extensions)
 else ( )
-    add_library(xlua SHARED
+    add_library(xlua STATIC
         ${LUA_CORE}
         ${LUA_LIB}
         ${LUA_SOCKET}
         ${XLUA_CORE}
         ${THIRDPART_SRC}
     )
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -flto")
+    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -flto")
 endif ( )

 if ( WIN32 AND NOT CYGWIN )
  1. Unity 需要反编译Editor\Data\il2cpp\build\deploy\net471\Unity.IL2CPP.Building.dll进行魔改(启用lld作为链接器 & 带上lto编译、链接参数) 如果你也是2019,可以用我这个dll替换Unity.IL2CPP.Building.zip
 ToolChains/Android/AndroidNDKUtilities.cs | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/ToolChains/Android/AndroidNDKUtilities.cs b/ToolChains/Android/AndroidNDKUtilities.cs
index 52fdc3a..11eecdb 100644
--- a/ToolChains/Android/AndroidNDKUtilities.cs
+++ b/ToolChains/Android/AndroidNDKUtilities.cs
@@ -453,7 +453,15 @@ namespace Unity.IL2CPP.Building.ToolChains.Android
        // Token: 0x0600026C RID: 620 RVA: 0x0000BD5B File Offset: 0x00009F5B
        public IEnumerable<string> GetArchitectureLinkerFlags(BuildConfiguration configuration)
        {
-           string str = this.CanUseGoldLinker(configuration) ? "gold" : "bfd";
+           string str = "bfd";
+           if (this.CanUseGoldLinker(configuration))
+           {
+               str = "gold";
+           }
+           else if (configuration == BuildConfiguration.ReleasePlus)
+           {
+               str = "lld";
+           }
            string str2 = PlatformUtils.IsWindows() ? ".exe" : string.Empty;
            yield return "-fuse-ld=" + str + str2;
            yield break;
 ToolChains/AndroidToolChain.cs | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/ToolChains/AndroidToolChain.cs b/ToolChains/AndroidToolChain.cs
index 392cb62..8a8e76b 100644
--- a/ToolChains/AndroidToolChain.cs
+++ b/ToolChains/AndroidToolChain.cs
@@ -315,6 +315,10 @@ namespace Unity.IL2CPP.Building.ToolChains
                yield return "-fPIC";
            }
            yield return (base.BuildConfiguration == BuildConfiguration.Debug) ? "-O0" : "-Os";
+           if (base.BuildConfiguration == BuildConfiguration.ReleasePlus)
+           {
+               yield return "-flto";
+           }
            if (this.AndroidNDK.GnuBinutils)
            {
                yield return "--sysroot " + this.AndroidNDK.CompilerSysRoot.InQuotes();
@@ -405,10 +409,22 @@ namespace Unity.IL2CPP.Building.ToolChains
            IEnumerator<string> enumerator2 = null;
            yield return "-llog";
            yield return "-rdynamic";
-           if (base.BuildConfiguration == BuildConfiguration.ReleasePlus && this.AndroidNDK.CanUseGoldLinker(base.BuildConfiguration))
+           if (base.BuildConfiguration == BuildConfiguration.ReleasePlus)
            {
-               yield return "-Wl,--icf=safe";
-               yield return "-Wl,--icf-iterations=5";
+               if (this.AndroidNDK.CanUseGoldLinker(base.BuildConfiguration))
+               {
+                   yield return "-Wl,--icf=safe";
+                   yield return "-Wl,--icf-iterations=5";
+               }
+               else
+               {
+                   if (PlatformUtils.IsWindows())
+                   {
+                       yield return "-Wl,--no-threads";
+                   }
+                   yield return "-flto";
+                   yield return "-Wl,--icf=safe";
+               }
            }
            foreach (string text in this.AndroidNDK.GetArchitectureLinkerFlags(base.BuildConfiguration))
            {
whitecostume commented 1 year ago

ua性能可以提升不少,android下在华为麒麟cpu测试for循环比jitless v8快,高通cpu则比jitless v8略慢一点,但差不多(此前高通下测试比jitless

请问下升级ndk用的哪个版本的ndk呢。

chexiongsheng commented 1 year ago

ua性能可以提升不少,android下在华为麒麟cpu测试for循环比jitless v8快,高通cpu则比jitless v8略慢一点,但差不多(此前高通下测试比jitless

请问下升级ndk用的哪个版本的ndk呢。

最后面发现和ndk可能没关系,应该和当时出版本的github action的虚拟机环境有关。 我后面在我自己fork的仓库回退到原先的ndk,重新编译后性能也正常,所以很可能和ndk版本没关,和换了ndk在现在的action虚拟机重新触发了编译有关。

我测试过这两个版本都是很慢的(正常值的一半): https://github.com/Tencent/xLua/releases/tag/v2.1.16_with_silicon_support https://github.com/Tencent/xLua/releases/tag/v2.1.16_newest_luajit